cout on extra thread - thread safety - c++

I have a very time sensitive task in my main thread. However, I would also like to simultaneously print some information about this task.
Problem: cout takes some time to execute and therefore slows down the time sensitive task in the main thread.
Idea: I thought about creating an extra thread which handles the output. To communicate between the main thread and the newly created thread, I thought about a vector which includes the strings that should be printed. In the newly created thread an infinite while loop would print these strings one after another.
Problem with the idea: Vectors are not thread safe. Therefore, I'm worried that locking and unlocking the vector will take nearly as much time as it would take when calling cout directly in the main thread.
Question: Is there an alternative to locking/unlocking the vector? Are my worries regarding the locking of the vector misguided? Would you take a whole different approach to solve the problem?

Depending on how time-sensitive the task is, I'd probably build up a vector of outputs in the producer thread, then pass the whole vector to the consumer thread (and repeat as needed).
The queue between the two will need to be thread safe, but you can keep the overhead minuscule by passing a vector every, say, 50-100 ms or so. This is still short enough to look like real-time to most observers, but is long enough to keep the overhead of locking far too low to care about in most cases.

You could use the idea one often sees in "interrupt" programming - send data from the thread to a ring buffer. Then, in another thread, print from the ring buffer. Actually, in the "good old days", one could write a ring buffer without any "atomics" (and can still do on some embedded systems).
But, even with atomics, ring buffers are not hard to write. There is one implementation here: c++ threadsafe ringbuffer implementation (untested, but on the first look seems OK).

Related

Multirate threads

I ran recently into a requirement in which there is a need for multithreaded application whose threads run at different rates.
The questions then become, since i am still learning multithreading:
A scenario is given to put things into perspective:
Say 1st thread runs at 100 Hz "real time"
2nd runs at 10 Hz
and say that the 1st thread provides data "myData" to the 2nd thread.
How is myData going to be provided to the 2nd thread, is the common practice to just read whatever is available from the first thread, or there need to be some kind of decimation to reduce the rate.
Does the myData need to be some kind of Singleton with locking mechanism. Although myData isn't shared, but rather updated by the first thread and used in the second thread.
How about the opposite case, when the data used in one thread need to be used at higher rate in a different thread.
How is myData going to be provided to the 2nd thread
One common method is to provide a FIFO queue -- this could be a std::dequeue or a linked list, or whatever -- and have the producer thread push data items onto one end of the queue while the consumer thread pops the data items off of the other end of the queue. Be sure to serialize all accesses to the FIFO queue (using a mutex or similar locking mechanism), to avoid race conditions.
Alternatively, instead of a queue you could have a single shared data object (essentially a queue of length one) and have your producer thread overwrite the object every time it generates new data. This could be done in cases where it's not important that the consumer thread sees every piece of data that was generated, but rather it's only important that it sees the most recent data. You'd still need to do the locking, though, to avoid the risk of the consumer thread reading from the data object at the same time the producer thread is in the middle of writing to it.
or does there need to be some kind of decimation to reduce the rate.
There doesn't need to be any decimation -- the second thread can just read in as much data as there is available to read, whenever it wakes up.
Does the myData need to be some kind of Singleton with locking
mechanism.
Singleton isn't necessary (although it's possible to do it that way). The locking mechanism is necessary, unless you have some kind of lock-free synchronization mechanism (and if you're asking this level of question, you don't have one and you don't want to try to get one either -- keep things simple for now!)
How about the opposite case, when the data used in one thread need to
be used at higher rate in a different thread.
It's the same -- if you're using a proper inter-thread communications mechanism, the rates at which the threads wake up doesn't matter, because the communications mechanism will do the right thing regardless of when or how often the the threads wake up.
Any multithreaded program has to cope with the possibility that one of the threads will work faster than another - by any ratio - even if they're executing on the same CPU with the same clock frequency.
Your choices include:
producer-consumer container than lets the first thread enqueue data, and the second thread "pop" it off for processing: you could let the queue grow as large as memory allows, or put some limit on the size after which either data would be lost or the 1st thread would be forced to slow down and wait to enqueue further values
there are libraries available (e.g. boost), or if you want to implement it yourself google some tutorials/docs on mutex and condition variables
do something conceptually similar to the above but where the size limit is 1 so there's just the single myData variable rather than a "container" - but all the synchronisation and delay choices remain the same
The Singleton pattern is orthogonal to your needs here: the two threads do need to know where the data is, but that would normally be done using e.g. a pointer argument to the function(s) run in the threads. Singleton's easily overused and best avoided unless reasons stack up high....

How to implement platform independent asynchronous write to file?

I am creating a program that will receive messages from a remote machine and needs to write the messages to a file on disk. The difficulty I am finding lies in the fact that the aim of this program is to test the performance of the library that receives the messages and, therefore, I need to make sure that writing the messages to disk does not affect the performance of the library. The library delivers the messages to the program via a callback function. One other difficulty is that the solution must be platform independent.
What options do I have?
I thought of the following:
using boost:asio to write to file, but it seems (see this documentation) that asynchronous write to file is in the Windows specific part of this library - so this cannot be used.
Using boost::interprocess to create a message queue but this documentation indicates that there are 3 methods in which the messages can be sent, and all methods would require the program to block (implicitly or not) if the message queue is full, which I cannot risk.
creating a std::deque<MESSAGES> to push to the deque from the callback function, and pop the messages out while writing to the file (on a separate thread), but STL containers are not guaranteed to be thread-safe. I could lock the pushing onto, and popping off, the deque but we are talking about 47 microseconds between successive messages so I would like to avoid locks altogether.
Does anyone have any more ideas on possible solutions?
STL containers may not be thread-safe but I haven't ever hit one that cannot be used at different times on different threads. Passing the ownership to another thread seems safe.
I have used the following a couple of times, so I know it works:
Create a pointer to a std::vector.
Create a mutex lock to protect the vector pointer.
Use new[] to create a std::vector and then reserve() a large size for it.
In the receiver thread:
Lock the mutex whenever adding an item to the queue. This should be a short lock.
Add queue item.
Release the lock.
If you feel like it signal a condition variable. I sometimes don't: it depends on the design. If the volume is very high and there is no pause in the receive side just skip the condition and poll instead.
On the consumer thread (the disk writer):
Go look for work to do by polling or waiting on a condition variable:
Lock the queue mutex.
Look at the queue length.
If there is work in the queue assign the pointer to a variable in the consumer thread.
Use new[] and reserve() to create a new queue vector and assign it to the queue pointer.
Unlock the mutex.
Go off and write your items to disk.
delete[] the used-up queue vector.
Now, depending on your problem you may end up needing a way to block. For example in one of my programs if the queue length ever hits 100,000 items, the producing thread just starts doing 1 second sleeps and complaining a lot. It is one of those things that shouldn't happen, yet does, so you should consider it. Without any limits at all it will just use all the memory on the machine and then crash with an exception, get killed by OOM or just come to a halt in a swap storm.
boost::thread is platform independent so you should be able to utilize that create a thread to do the blocking writes on. To avoid needing to lock the container every time a message is placed into the main thread you can utilize a modification on the double buffering technique by creating nested containers, such as:
std::deque<std::deque<MESSAGES> >
Then only lock the top level deque when a deque full of messages is ready to be added. The writing thread would in turn only lock the top level deque to pop off a deque full of messages to be written.

Thread safe programming

I keep hearing about thread safe. What is that exactly and how and where can I learn to program thread safe code?
Also, assume I have 2 threads, one that writes to a structure and another one that reads from it. Is that dangerous in any way? Is there anything I should look for? I don't think it is a problem. Both threads will not (well can't ) be accessing the struct at the exact same time..
Also, can someone please tell me how in this example : https://stackoverflow.com/a/5125493/1248779 we are doing a better job in concurrency issues. I don't get it.
It's a very deep topic. At the heart threads are usually about making things go fast by using multiple cores at the same time; or about doing long operations in the background when you don't have a good way to interleave the operation with a 'primary' thread. The latter being very common in UI programming.
Your scenario is one of the classic trouble spots, and one of the first people run into. It's vary rare to have a struct where the members are truly independent. It's very common to want to modify multiple values in the structure to maintain consistency. Without any precautions it is very possible to modify the first value, then have the other thread read the struct and operate on it before the second value has been written.
Simple example would be a 'point' struct for 2d graphics. You'd like to move the point from [2,2] to [5,6]. If you had a different thread drawing a line to that point you could end up drawing to [5,2] very easily.
This is the tip of the iceberg really. There are lots of great books, but learning this space usually goes something like this:
Uh oh, I just read from that thing in an inconsistent state.
Uh oh, I just modified that thing from 2 threads and now it's garbage.
Yay! I learned about locks
Whoa, I have a lot of locks and everything seems to just hang sometimes when I have lots of them locking in nested code.
Hrm. I need to stop doing this locking on the fly, I seem to be missing a lot of places; so I should encapsulate them in a data structure.
That data structure thing was great, but now I seem to be locking all the time and my code is just as slow as a single thread.
condition variables are weird
It's fast because I got clever with how I lock things. Hrm. Sometimes data corrupts.
Whoa.... InterlockedWhatDidYouSay?
Hey, look no lock, I do this thing called a spin lock.
Condition variables. Hrm... I see.
You know what, how about I just start thinking about how to operate on this stuff in completely independent ways, pipelineing my operations, and having as few cross thread dependencies as possible...
Obviously it's not all about condition variables. But there are many problems that can be solved with threading, and probably almost as many ways to do it, and even more ways to do it wrong.
Thread-safety is one aspect of a larger set of issues under the general heading of "Concurrent Programming". I'd suggest reading around that subject.
Your assumption that two threads cannot access the struct at the same time is not good. First: today we have multi-core machines, so two threads can be running at exactly the same time. Second: even on a single core machine the slices of time given to any other thread are unpredicatable. You have to anticipate that ant any arbitrary time the "other" thread might be processing. See my "window of opportunity" example below.
The concept of thread-safety is exactly to answer the question "is this dangerous in any way". The key question is whether it's possible for code running in one thread to get an inconsistent view of some data, that inconsistency happening because while it was running another thread was in the middle of changing data.
In your example, one thread is reading a structure and at the same time another is writing. Suppose that there are two related fields:
{ foreground: red; background: black }
and the writer is in the process of changing those
foreground = black;
<=== window of opportunity
background = red;
If the reader reads the values at just that window of opportunity then it sees a "nonsense" combination
{ foreground: black; background: black }
This essence of this pattern is that for a brief time, while we are making a change, the system becomes inconsistent and readers should not use the values. As soon as we finish our changes it becomes safe to read again.
Hence we use the CriticalSection APIs mentioned by Stefan to prevent a thread seeing an inconsistent state.
what is that exactly?
Briefly, a program that may be executed in a concurrent context without errors related to concurrency.
If ThreadA and ThreadB read and/or write data without errors and use proper synchronization, then the program may be threadsafe. It's a design choice -- making an object threadsafe can be accomplished a number of ways, and more complex types may be threadsafe using combinations of these techniques.
and how and where can I learn to program thread safe code?
boost/libs/thread/ would likely be a good introduction. The topic is quite complex.
The C++11 standard library provides implementations for locks, atomics and threads -- any well written programs which use these would be a good read. The standard library was modeled after boost's implementation.
also, assume I have 2 threads one that writes to a structure and another one that reads from it. Is that dangerous in any way? is there anything I should look for?
Yes, it can be dangerous and/or may produce incorrect results. Just imagine that a thread may run out of its time at any point, and then another thread could then read or modify that structure -- if you have not protected it, it may be in the middle of an update. A common solution is a lock, which can be used to prevent another thread from accessing shared resources during reads/writes.
When writing multithreaded C++ programs on WIN32 platforms, you need to protect certain shared objects so that only one thread can access them at any given time from different threads. You can use 5 system functions to achieve this. They are InitializeCriticalSection, EnterCriticalSection, TryEnterCriticalSection, LeaveCriticalSection, and DeleteCriticalSection.
Also maybe this links can help:
how to make an application thread safe?
http://www.codeproject.com/Articles/1779/Making-your-C-code-thread-safe
Thread safety is a simple concept: is it "safe" to perform operation A on one thread whilst another thread is performing operation B, which may or may not be the same as operation A. This can be extended to cover many threads. In this context, "safe" means:
No undefined behaviour
All invariants of the data structures are guaranteed to be observed by the threads
The actual operations A and B are important. If two threads both read a plain int variable, then this is fine. However, if any thread may write to that variable, and there is no synchronization to ensure that the read and write cannot happen together, then you have a data race, which is undefined behaviour, and this is not thread safe.
This applies equally to the scenario you asked about: unless you have taken special precautions, then it is not safe to have one thread read from a structure at the same time as another thread writes to it. If you can guarantee that the threads cannot access the data structure at the same time, through some form of synchronization such as a mutex, critical section, semaphore or event, then there is not a problem.
You can use things like mutexes and critical sections to prevent concurrent access to some data, so that the writing thread is the only thread accessing the data when it is writing, and the reading thread is the only thread accessing the data when it is reading, thus providing the guarantee I just mentioned. This therefore avoids the undefined behaviour mentioned above.
However, you still need to ensure that your code is safe in the wider context: if you need to modify more than one variable then you need to hold the lock on the mutex across the whole operation rather than for each individual access, otherwise you may find that the invariants of your data structure may not be observed by other threads.
It is also possible that a data structure may be thread safe for some operations but not others. For example, a single-producer single-consumer queue will be OK if one thread is pushing items on the queue and another is popping items off the queue, but will break if two threads are pushing items, or two threads are popping items.
In the example you reference, the point is that global variables are implicitly shared between all threads, and therefore all accesses must be protected by some form of synchronization (such as a mutex) if any thread can modify them. On the other hand, if you have a separate copy of the data for each thread, then that thread can modify its copy without worrying about concurrent access from any other thread, and no synchronization is required. Of course, you always need synchronization if two or more threads are going to operate on the same data.
My book, C++ Concurrency in Action covers what it means for things to be thread safe, how to design thread safe data structures, and the C++ synchronization primitives used for the purpose, such as std::mutex.
Threads safe is when a certain block of code is protected from being accessed by more than one thread. Meaning that the data manipulated always stays in a consistent state.
A common example is the producer consumer problem where one thread reads from a data structure while another thread writes to the same data structure : Detailed explanation
To answer the second part of the question: Imagine two threads both accessing std::vector<int> data:
//first thread
if (data.size() > 0)
{
std::cout << data[0]; //fails if data.size() == 0
}
//second thread
if (rand() % 5 == 0)
{
data.clear();
}
else
{
data.push_back(1);
}
Run these threads in parallel and your program will crash because std::cout << data[0]; might be executed directly after data.clear();.
You need to know that at any point of your thread code, the thread might be interrupted, e.g. after checking that (data.size() > 0), and another thread could become active. Although the first thread looks correct in a single threaded app, it's not in a multi-threaded program.

how to synchronize three dependent threads

If I have
1. mainThread: write data A,
2. Thread_1: read A and write it to into a Buffer;
3. Thread_2: read from the Buffer.
how to synchronize these three threads safely, with not much performance loss? Is there any existing solution to use? I use C/C++ on linux.
IMPORTANT: the goal is to know the synchronization mechanism or algorithms for this particular case, not how mutex or semaphore works.
First, I'd consider the possibility of building this as three separate processes, using pipes to connect them. A pipe is (in essence) a small buffer with locking handled automatically by the kernel. If you do end up using threads for this, most of your time/effort will be spent on creating nearly an exact duplicate of the pipes that are already built into the kernel.
Second, if you decide to build this all on your own anyway, I'd give serious consideration to following a similar model anyway. You don't need to be slavish about it, but I'd still think primarily in terms of a data structure to which one thread writes data, and from which another reads the data. By strong preference, all the necessary thread locking necessary would be built into that data structure, so most of the code in the thread is quite simple, reading, processing, and writing data. The main difference from using normal Unix pipes would be that in this case you can maintain the data in a more convenient format, instead of all the reading and writing being in text.
As such, what I think you're looking for is basically a thread-safe queue. With that, nearly everything else involved becomes borders on trivial (at least the threading part of it does -- the processing involved may not be, but at least building it with multiple threads isn't adding much to the complexity).
It's hard to say how much experience with C/C++ threads you have. I hate to just point to a link but have you read up on pthreads?
https://computing.llnl.gov/tutorials/pthreads/
And for a shorter example with code and simple mutex'es (lock object you need to sync data):
http://students.cs.byu.edu/~cs460ta/cs460/labs/pthreads.html
I would suggest Boost.Thread for this purpose. This is quite good framework with mutexes and semaphores, and it is multiplatform. Here you can find very good tutorial about this.
How exactly synchronize these threads is another problem and needs more information about your problem.
Edit The simplest solution would be to put two mutexes -- one on A and second on Buffer. You don't have to worry about deadlocks in this particular case. Just:
Enter mutex_A from MainThread; Thread1 waits for mutex to be released.
Leave mutex from MainThread; Thread1 enters mutex_A and mutex_Buffer, starts reading from A and writes it to Buffer.
Thread1 releases both mutexes. ThreadMain can enter mutex_A and write data, and Thread2 can enter mutex_Buffer safely read data from Buffer.
This is obviously the simplest solution, and probably can be improved, but without more knowledge about the problem, this is the best I can come up with.

How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables?

I have a multi-threaded application that is using pthreads. I have a mutex() lock and condition variables(). There are two threads, one thread is producing data for the second thread, a worker, which is trying to process the produced data in a real time fashion such that one chuck is processed as close to the elapsing of a fixed time period as possible.
This works pretty well, however, occasionally when the producer thread releases the condition upon which the worker is waiting, a delay of up to almost a whole second is seen before the worker thread gets control and executes again.
I know this because right before the producer releases the condition upon which the worker is waiting, it does a chuck of processing for the worker if it is time to process another chuck, then immediately upon receiving the condition in the worker thread, it also does a chuck of processing if it is time to process another chuck.
In this later case, I am seeing that I am late processing the chuck many times. I'd like to eliminate this lost efficiency and do what I can to keep the chucks ticking away as close to possible to the desired frequency.
Is there anything I can do to reduce the delay between the release condition from the producer and the detection that that condition is released such that the worker resumes processing? For example, would it help for the producer to call something to force itself to be context switched out?
Bottom line is the worker has to wait each time it asks the producer to create work for itself so that the producer can muck with the worker's data structures before telling the worker it is ready to run in parallel again. This period of exclusive access by the producer is meant to be short, but during this period, I am also checking for real-time work to be done by the producer on behalf of the worker while the producer has exclusive access. Somehow my hand off back to running in parallel again results in significant delay occasionally that I would like to avoid. Please suggest how this might be best accomplished.
I could suggest the following pattern. Generally the same technique could be used, e.g. when prebuffering frames in some real-time renderers or something like that.
First, it's obvious that approach that you describe in your message would only be effective if both of your threads are loaded equally (or almost equally) all the time. If not, multi-threading would actually benefit in your situation.
Now, let's think about a thread pattern that would be optimal for your problem. Assume we have a yielding and a processing thread. First of them prepares chunks of data to process, the second makes processing and stores the processing result somewhere (not actually important).
The effective way to make these threads work together is the proper yielding mechanism. Your yielding thread should simply add data to some shared buffer and shouldn't actually care about what would happen with that data. And, well, your buffer could be implemented as a simple FIFO queue. This means that your yielding thread should prepare data to process and make a PUSH call to your queue:
X = PREPARE_DATA()
BUFFER.LOCK()
BUFFER.PUSH(X)
BUFFER.UNLOCK()
Now, the processing thread. It's behaviour should be described this way (you should probably add some artificial delay like SLEEP(X) between calls to EMPTY)
IF !EMPTY(BUFFER) PROCESS(BUFFER.TOP)
The important moment here is what should your processing thread do with processed data. The obvious approach means making a POP call after the data is processed, but you will probably want to come with some better idea. Anyway, in my variant this would look like
// After data is processed
BUFFER.LOCK()
BUFFER.POP()
BUFFER.UNLOCK()
Note that locking operations in yielding and processing threads shouldn't actually impact your performance because they are only called once per chunk of data.
Now, the interesting part. As I wrote at the beginning, this approach would only be effective if threads act somewhat the same in terms of CPU / Resource usage. There is a way to make these threading solution effective even if this condition is not constantly true and matters on some other runtime conditions.
This way means creating another thread that is called controller thread. This thread would merely compare the time that each thread uses to process one chunk of data and balance the thread priorities accordingly. Actually, we don't have to "compare the time", the controller thread could simply work the way like:
IF BUFFER.SIZE() > T
DECREASE_PRIORITY(YIELDING_THREAD)
INCREASE_PRIORITY(PROCESSING_THREAD)
Of course, you could implement some better heuristics here but the approach with controller thread should be clear.