Critical sections better in thread or main program? - c++

I use to use critical section (in c++) to block theads execution whilel accessing shared data, but as to work them must need to wait until data is not used before blocking, maybe it's better to use them in main or thread.
Then if I want my main program to have priority and not be blocked must I use critical sections inside it to block other thread or the contrary ?

You seem to have rather a misconception over what critical sections are and how they work.
Speaking generically, a critical section (CS) is a piece of code that needs to run "exclusively" -- i.e., you need to ensure that only one thread is executing that piece of code at any given time.
As the term is used in most environments, a CS is really a mutex -- a mutual exclusion semaphore (aka binary semaphore). It's a data structure (and set of functions) you use to ensure that a section of code gets executed exclusively (rather than referring to the code itself).
In any case, a CS only makes sense at all when/if you have some code that will execute in more than one thread, and you need to ensure that it only ever executes in one thread at any given time. This is typically when you have some shared data that could and would be corrupted if more than one thread tried to manipulate it at one time. When/if that arises, you need to "use" the critical section for every thread that manipulates that data to assure that the shared data isn't corrupted.
Assuring that a particular thread remains responsive is a whole separate question. In most cases, this means using a queue (for one possibility) to allow the thread to "hand off" a task to some other thread quickly, with minimal contention (i.e., instead of using a CS for the duration of processing the data, the CS only lasts long enough to put a data structure into a queue, and some other thread takes the processing from there).

You cannot say "I am using critical section in thread A but not in thread B". Critical section is a piece of code that accesses shared resource. When this code is executed from two threads that run in parallel, shared resource might get corrupted so therefore you need to synchronise access to it: you need to use some of synchronisation objects (mutexes, semaphores, events...depending on the platform and API you are using). ThreadA locks the critical section so ThreadB needs to wait till ThreadA releases it.
If you want your main thread to block (wait) less than working thread, set working thread priority to be lower than priority of the main thread.

Related

Multirate threads

I ran recently into a requirement in which there is a need for multithreaded application whose threads run at different rates.
The questions then become, since i am still learning multithreading:
A scenario is given to put things into perspective:
Say 1st thread runs at 100 Hz "real time"
2nd runs at 10 Hz
and say that the 1st thread provides data "myData" to the 2nd thread.
How is myData going to be provided to the 2nd thread, is the common practice to just read whatever is available from the first thread, or there need to be some kind of decimation to reduce the rate.
Does the myData need to be some kind of Singleton with locking mechanism. Although myData isn't shared, but rather updated by the first thread and used in the second thread.
How about the opposite case, when the data used in one thread need to be used at higher rate in a different thread.
How is myData going to be provided to the 2nd thread
One common method is to provide a FIFO queue -- this could be a std::dequeue or a linked list, or whatever -- and have the producer thread push data items onto one end of the queue while the consumer thread pops the data items off of the other end of the queue. Be sure to serialize all accesses to the FIFO queue (using a mutex or similar locking mechanism), to avoid race conditions.
Alternatively, instead of a queue you could have a single shared data object (essentially a queue of length one) and have your producer thread overwrite the object every time it generates new data. This could be done in cases where it's not important that the consumer thread sees every piece of data that was generated, but rather it's only important that it sees the most recent data. You'd still need to do the locking, though, to avoid the risk of the consumer thread reading from the data object at the same time the producer thread is in the middle of writing to it.
or does there need to be some kind of decimation to reduce the rate.
There doesn't need to be any decimation -- the second thread can just read in as much data as there is available to read, whenever it wakes up.
Does the myData need to be some kind of Singleton with locking
mechanism.
Singleton isn't necessary (although it's possible to do it that way). The locking mechanism is necessary, unless you have some kind of lock-free synchronization mechanism (and if you're asking this level of question, you don't have one and you don't want to try to get one either -- keep things simple for now!)
How about the opposite case, when the data used in one thread need to
be used at higher rate in a different thread.
It's the same -- if you're using a proper inter-thread communications mechanism, the rates at which the threads wake up doesn't matter, because the communications mechanism will do the right thing regardless of when or how often the the threads wake up.
Any multithreaded program has to cope with the possibility that one of the threads will work faster than another - by any ratio - even if they're executing on the same CPU with the same clock frequency.
Your choices include:
producer-consumer container than lets the first thread enqueue data, and the second thread "pop" it off for processing: you could let the queue grow as large as memory allows, or put some limit on the size after which either data would be lost or the 1st thread would be forced to slow down and wait to enqueue further values
there are libraries available (e.g. boost), or if you want to implement it yourself google some tutorials/docs on mutex and condition variables
do something conceptually similar to the above but where the size limit is 1 so there's just the single myData variable rather than a "container" - but all the synchronisation and delay choices remain the same
The Singleton pattern is orthogonal to your needs here: the two threads do need to know where the data is, but that would normally be done using e.g. a pointer argument to the function(s) run in the threads. Singleton's easily overused and best avoided unless reasons stack up high....

Best practices for inter thread communication of large objects

I need to be extremely concerned with speed/latency in my current multi-threaded project.
Consider the following inter-thread communication:
Thread #1: Processes a large object, call it A, that it receives from network events. This data significantly alters its internal state.
Thread #2: Needs to know about the current, altered internal state of object A to make some set of decisions.
I can think of essentially two methods to proceed:
Thread #2 has a pointer to object A, and when signaled (say, through constantly checking a small object sent via a lockfree queue or by checking a shared atomic bool), Thread #2 locks object A and reads it.
Thread #1 pushes some version of a copy of the object onto the lockfree queue so that thread #2 can simply receive it directly, use it, and dispose of the copy when it is finished.
Method #1 avoids the costly copy of a large object needed for Method #2, but it always requires a lock/unlock and, if I understand things correctly, additional L3 cache hits.
I understand there may not be any simple performance answers...that I may simply need to benchmark. I am mostly interested in best practices advice. Specifically, I'd like to know to think about the problem a cache memory level to know how information is being passed and copied internally.
It really depends on how much information thread 2 is trying to access from thread 1. If it needs complete access of all of thread 1's objects/memory then yeah, copying all that is costly and I'd do a lock. But it also depends on the time. If all thread 2 needs is to check a state like one variable I would have thread 1 update a smaller object outside of itself that can be shared without locks since thread 2 only reads while thread 1 only writes.
Also you might not even need locking if thread 2 only reads and is ok with making a decision on a data state as it updates.

CreateMutex Confusion

So I have called createmutex like so
while(1){
HANDLE h;
h=CreateMutex(NULL,TRUE,"mutex1");
y=WaitForSingleObject(h,INFINITE);
///random code
ReleaseMutex(h)
}
It runs fine after looping twice, but deadlocks on WaitForSingleObject (h,INFINITE) after the third loop. This is with two threads running concurrently. How can it deadlock when ReleaseMutex is called? Is the createmutex function called correctly?
You're waiting on a mutex that's already owned... please don't do that.
Also, you're not destroying the mutex, only releasing it. The next call should give you ERROR_ALREADY_EXISTS. The complete quote from MSDN is "If the mutex is a named mutex and the object existed before this function call, the return value is a handle to the existing object, GetLastError returns ERROR_ALREADY_EXISTS, bInitialOwner is ignored, and the calling thread is not granted ownership."
If any of the "random code" waits for the other thread to make progress, it could deadlock while owning the mutex. In which case the other thread will wait forever trying to acquire the mutex, which is the behavior you're seeing.
I suspect you are trying to implement mutual exclusion within a single process. If that is so then the correct synchronization object is the critical section. The naming of these objects is a little confusing because both mutexes and critical sections peform mutual exclusion.
The interface for the critical section is much simpler to use, it being essentially an acquire function and a corresponding release function. If you are synchronizing within a single process, and you need a simple lock (rather than, say, a semaphore), you should use critical sections rather than mutexes.
In fact, very recently here on Stack Overflow, I wrote a more detailed answer to a question which described the standard usage pattern for critical sections. That post has lots of links to the pertinent sections of MSDN documentation.
You only need to use a mutex when you are performing cross process synchronization. Indeed you should only use a mutex when you are synchronizing across a process because critical sections perform so much better (i.e. faster).

SetThreadAffinityMask of pooled thread

I am wondering whether it is possible to set the processor affinity of a thread obtained from a thread pool. More specifically the thread is obtained through the use of TimerQueue API which I use to implement periodic tasks.
As a sidenote: I found TimerQueues the easiest way to implement periodic tasks but since these are usually longliving tasks might it be more appropriate to use dedicated threads for this purpose? Furthermore it is anticipated that synchronization primites such as semapores and mutexes need to be used to synchronize the various periodic tasks. Are the pooled threads suitable for these?
Thanks!
EDIT1: As Leo has pointed out the above question is actually two only loosely related ones. The first one is related to processor affinity of pooled threads. The second question is related to whether pooled threads obtained from the TimerQueue API are behaving just like manually created threads when it comes to synchronization objects. I will move this second question the a seperate topic.
If you do this, make sure you return things to how they were every time you release a thread back to the pool. Since you don't own those threads and other code which uses them may have other requirements/assumptions.
Are you sure you actually need to do this, though? It's very, very rare to need to set processor affinity. (I don't think I've ever needed to do it in anything I've written.)
Thread affinity can mean two quite different things. (Thanks to bk1e's comment to my original answer for pointing this out. I hadn't realised myself.)
What I would call processor affinity: Where a thread needs to be run consistently on a the same processor. This is what SetThreadAffinityMask deals with and it's very rare for code to care about it. (Usually it's due to very low-level issues like CPU caching in high performance code. Usually the OS will do its best to keep threads on the same CPU and it's usually counterproductive to force it to do otherwise.)
What I would call thread affinity: Where objects use thread-local storage (or some other state tied to the thread they're accessed from) and will go wrong if a sequence of actions is not done on the same thread.
From your question it sounds like you may be confusing #1 with #2. The thread itself will not change while your callback is running. While a thread is running it may jump between CPUs but that is normal and not something you have to worry about (except in very special cases).
Mutexes, semaphores, etc. do not care if a thread jumps between CPUs.
If your callback is executed by the thread pool multiple times, there is (depending on how the pool is used) usually no guarantee that the same thread will be used each time. i.e. Your callback may jump between threads, but not while it is in the middle of running; it may only change threads each time it runs again.
Some synchronization objects will care if your callback code runs on one thread and then, still thinking it holding locks on those objects, runs again on a different thread. (The first thread will still hold the locks, not the second one, although it depends which kind of synchronization object you use. Some don't care.) That isn't a #1, though; that's #2, and not something you'd use SetThreadAffinityMask to deal with.
As an example, Mutexes (CreateMutex) are owned by a thread. If you acquire a mutex on Thread A then any other thread which tries to acquire the mutex will block until you release the mutex on Thread A. (It is also an error for a thread to release a mutex it does not own.) So if your callback acquired a mutex, then exited, then ran again on another thread and released the mutex from there, it would be wrong.
On the other hand, an Event (CreateEvent) does not care which threads create, signal or destroy it. You can signal an event on one thread and then reset it on another and that's fine (normal, in fact).
It'd also be rare to hold a synchronization object between two separate runs of your callback (that would invite deadlocks, although there are certainly situations where you could legitimately want/do such a thing). However, if you created (for example) an apartment-threaded COM object then that would be something you would want to only access from one specific thread.
You shouldn't. You're only supposed to use that thread for the job at hand, on the processor it's running on at that point. Apart from the obvious inefficiency, the threadpool might destroy every thread as soon as you're done, and create a new one for your next job. The affinity masks wouldn't disappear that soon in practice, but it's even harder to debug if they disappear at random.

How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables?

I have a multi-threaded application that is using pthreads. I have a mutex() lock and condition variables(). There are two threads, one thread is producing data for the second thread, a worker, which is trying to process the produced data in a real time fashion such that one chuck is processed as close to the elapsing of a fixed time period as possible.
This works pretty well, however, occasionally when the producer thread releases the condition upon which the worker is waiting, a delay of up to almost a whole second is seen before the worker thread gets control and executes again.
I know this because right before the producer releases the condition upon which the worker is waiting, it does a chuck of processing for the worker if it is time to process another chuck, then immediately upon receiving the condition in the worker thread, it also does a chuck of processing if it is time to process another chuck.
In this later case, I am seeing that I am late processing the chuck many times. I'd like to eliminate this lost efficiency and do what I can to keep the chucks ticking away as close to possible to the desired frequency.
Is there anything I can do to reduce the delay between the release condition from the producer and the detection that that condition is released such that the worker resumes processing? For example, would it help for the producer to call something to force itself to be context switched out?
Bottom line is the worker has to wait each time it asks the producer to create work for itself so that the producer can muck with the worker's data structures before telling the worker it is ready to run in parallel again. This period of exclusive access by the producer is meant to be short, but during this period, I am also checking for real-time work to be done by the producer on behalf of the worker while the producer has exclusive access. Somehow my hand off back to running in parallel again results in significant delay occasionally that I would like to avoid. Please suggest how this might be best accomplished.
I could suggest the following pattern. Generally the same technique could be used, e.g. when prebuffering frames in some real-time renderers or something like that.
First, it's obvious that approach that you describe in your message would only be effective if both of your threads are loaded equally (or almost equally) all the time. If not, multi-threading would actually benefit in your situation.
Now, let's think about a thread pattern that would be optimal for your problem. Assume we have a yielding and a processing thread. First of them prepares chunks of data to process, the second makes processing and stores the processing result somewhere (not actually important).
The effective way to make these threads work together is the proper yielding mechanism. Your yielding thread should simply add data to some shared buffer and shouldn't actually care about what would happen with that data. And, well, your buffer could be implemented as a simple FIFO queue. This means that your yielding thread should prepare data to process and make a PUSH call to your queue:
X = PREPARE_DATA()
BUFFER.LOCK()
BUFFER.PUSH(X)
BUFFER.UNLOCK()
Now, the processing thread. It's behaviour should be described this way (you should probably add some artificial delay like SLEEP(X) between calls to EMPTY)
IF !EMPTY(BUFFER) PROCESS(BUFFER.TOP)
The important moment here is what should your processing thread do with processed data. The obvious approach means making a POP call after the data is processed, but you will probably want to come with some better idea. Anyway, in my variant this would look like
// After data is processed
BUFFER.LOCK()
BUFFER.POP()
BUFFER.UNLOCK()
Note that locking operations in yielding and processing threads shouldn't actually impact your performance because they are only called once per chunk of data.
Now, the interesting part. As I wrote at the beginning, this approach would only be effective if threads act somewhat the same in terms of CPU / Resource usage. There is a way to make these threading solution effective even if this condition is not constantly true and matters on some other runtime conditions.
This way means creating another thread that is called controller thread. This thread would merely compare the time that each thread uses to process one chunk of data and balance the thread priorities accordingly. Actually, we don't have to "compare the time", the controller thread could simply work the way like:
IF BUFFER.SIZE() > T
DECREASE_PRIORITY(YIELDING_THREAD)
INCREASE_PRIORITY(PROCESSING_THREAD)
Of course, you could implement some better heuristics here but the approach with controller thread should be clear.