I would like to write a program, where several worker threads should process different tasks with different priorities. Large tasks would be processed with low priority and small tasks with a very high priority.
In a perfect world I would simply set a different priority for each kind of task, but since it is more task types than priority levels available on Windows, I think i have to set the thread priorities dynamically.
I think there should be a main thread with highest priority, working as a kind of scheduler setting the priorities of the worker threads dynamically. But I wonder what actually happens on Windows, when I call SetThreadPriority() and especially how quick the priority change is taken into account by the OS.
Ideally I need to boost the priority of a 'small task thread' within < 1 ms. Is this possible? And is there any way to change the latency of the OS (if there is any) reacting on the priority change?
The windows dispatcher (scheduler) is not a single process/thread; it is spread across the kernel. The dispatcher is generally triggered by the following events:
Thread becomes ready for execution
Thread leaves running state (e.g. quantum expires, wait state, or done)
The thread's priority changes (e.g. SetThreadPriority)
Processor affinity changes
I need to boost the priority of a 'small task thread' within < 1 ms. Is this possible?
According to 3: Yes, the dispatcher will reschedule immediately.
Ref.: Windows Internals Tour: Windows Processes, Threads and
Memory, Microsoft Academic Club 2011
Related
I am using gcc c++ 4.7 on Debian 7. I want to set some priorities for my threads. Looks like I have to do it through pthread. I am getting confused by the scheduler policy and priority in pthread.
Q1:
I use sched_setscheduler in my c++ code to set the thread scheduler to SCHED_RR. Will all the threads in this process use this real time scheduler? Or I can have different scheduler policy in one process for different threads?
Q2:
Does the thread priority take effect only inside the process or across multiple process? E.g. I have two processes both using SCHED_RR. One has a thread with priority 99, the other has a thread with priority 98. Does the former thread has a higher priority over the latter? What if the threads use different scheduler, how to compare their priority?
Q3:
What "default" number should I use for a scheduler's priority? E.g. I use the code below:
struct sched_param param;
param.sched_priority = default_priority;
sched_setscheduler(0, SCHED_RR, ¶m));
What value should I set to the default_priority? I have some high priority thread, normal priority thread and some low priority thread in my program. Among 1-99, what number should I use for the priorities?
Q4:
Process priority and thread priority mixed. For example, I can use nice to set the process priority. If one process has lower process priority but in my code I set its thread to a high priority. Does this override the process priority setting?
I googled around and read various documents. I think I can answer my own question here.
Pthread has a contention scope attribute. It could be PTHREAD_SCOPE_SYSTEM or PTHREAD_SCOPE_PROCESS. It does not require implementation on both of them. Linux only supportes PTHREAD_SCOPE_SYSTEM, which means all threads from all processes compete with each other. Also, in Linux, thread is essentially a lightweight process. The process scheduler does not treat process and thread differently in scheduling.
Answers.
Q1:
Threads in the same process can have different scheduling policies and priorities.
Q2:
Threads compete across processes.
Q3:
I can use some arbitrary numbers. Each priority, from 1 to 99, will have its own queue in scheduling.
Q4:
The nice value is used in the Linux default SCHED_OTHER policy. When real time policy like SCHED_RR or SCHED_FIFO is used for a thread, the nice value has no effect. Since the min priority of SCHED_RR and SCHED_FIFO is 1 and SCHED_OTHER's priority can only be 0. So threads with real time policy always have scheduling preferences than non real time ones.
The answers apply to Linux only. Other OS like BSD, Solaris may have different pthread implementations.
In a project I run into a case like this (On windows 7),
When several threads are busy (all my CPU cores are busy working), there'll be delay for a thread
to receive a semaphore (which is increased from 0 to 1). It may be as long as 1.5ms.
I solve this by cache a little things and increase the semaphore value earlier.
So to me, it seems signaling a semaphore is slow, it's not immediately received by threads (especially when CPU are busy), but if you signal it earlier before some thread begin to wait on it,, there' be no delay.
I once thought event is just a semaphore with maximum value of 1,,, well, now having met this case, I'm beginning to wonder if event is faster than semaphore at noticing threads to 'wake up'.
Sorry, I tried, but didn't come out with a demo,, I'm not very good at threading yet.
EDIT:
Is it true that Event is faster than Semaphore on Windows?
1.5 milliseconds is not explained by just the overhead between different multithreading primitives.
To simplify, Threads have three states
blocked
runnable
running
If a thread is waiting on a semaphore or an event, then it's blocked. When the event is signalled, it becomes runnable.
So the real question is, "When does a runnable thread actually run?" This varies according to scheduler algorithms, etc, but obviously it needs to run on a core, and that means nothing else can be "running" on that core at the same time. The scheduler will normally 'remove' the current running thread from a core when one of the following happens
it waits on a semaphore/event, and so becomes 'blocked'
It's been running continually for a certain time (time-based, or round-robin scheduling)
A higher priority thread becomes runnable.
The 1.5 milliseconds is probably round-robin, or time-based scheduling. Your thread is runnable but just hasn't started yet. If the thread must start, and should boot out the current thread, then you can try to increase it's priority via SetThreadPriority
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx
If a thread is waiting on a semaphore and it gets signaled, the thread will in my limited testing, become running in ~10us on a box that is not overloaded.
Signaling, and subsequent dispatching onto a core, will take longer if:
The signaled thread is in another process than any thread is preempts.
Running the signaled thread requires a thread running on another core to be preempted.
The box is already overloaded with higher-priority threads.
1.5ms must represent an extreme case where your box is very busy.
In such a case, replacing the semaphore with an event is unlikely to result in any significant improvement to overall signaling latency because the bulk of the work/delay required by the inter-thread signaling is tied up the in scheduling/dispatching, which is required in either case.
I've just come across the Get/SetThreadPriority methods and they got me wondering - can a thread priority meaningfully be set higher than the owning process priority (which I don't believe can be changed programatically in the same way) ?
Are there any pitfalls to using these APIs?
Yes, you can set the thread priority to any class, including a class higher than the one of the current process. In fact, these two values are complementary and provide the base priority of the thread. You can read about it in the Remarks section of the link you posted.
You can set the process priority using SetPriorityClass.
Now that we got the technicalities out of the way, I find little use for manipulating the priority of a thread directly. The OS scheduler is sophisticated enough to boost the priority of threads blocked in I/O over threads doing CPU computations (to the point that an I/O thread will preempt a CPU thread when the I/O interrupt arrives). In fact, even I/O threads are differentiated, with keyboard I/O threads getting a priority boost over file I/O threads for example.
On Windows, the thread and process priorities are combined using an algorthm that decides overall scheduling priority:
Windows priorities
Pitfalls? Well:
Raising the priority of a thread is likely to give the greatest overall gain if it is usually blocked on IO but must run ASAP afer being signaled by its driver, eg. Video IO that must process buffers quickly.
Raising the priority of threads is likely to have the greatest overall negative impact if they are CPU-bound and raised to a high priority, so preventing the running of normal-priority threads. If taken to extremes, OS threads and utilities like Task Manger will not run.
It is always said when the count of a semaphore is 0, the process requesting the semaphore are blocked and added to a wait queue.
When some process releases the semaphore, and count increases from 0->1, a blocking process is activated. This can be any process, randomly picked from the blocked processes.
Now my question is:
If they are added to a queue, why is the activation of blocking processes NOT in FIFO order? I think it would be easy to pick next process from the queue rather than picking up a process at random and granting it the semaphore. If there is some idea behind this random logic, please explain. Also, how does the kernel select a process at random from queue? getting a random process that too from queue is something complex as far as a queue data structure is concerned.
tags: various OSes as each have a kernel usually written in C++ and mutex shares similar concept
A FIFO is the simplest data structure for the waiting list in a system
that doesn't support priorities, but it's not the absolute answer
otherwise. Depending on the scheduling algorithm chosen, different
threads might have different absolute priorities, or some sort of
decaying priority might be in effect, in which case, the OS might choose
the thread which has had the least CPU time in some preceding interval.
Since such strategies are widely used (particularly the latter), the
usual rule is to consider that you don't know (although with absolute
priorities, it will be one of the threads with the highest priority).
When a process is scheduled "at random", it's not that a process is randomly chosen; it's that the selection process is not predictable.
The algorithm used by Windows kernels is that there is a queue of threads (Windows schedules "threads", not "processes") waiting on a semaphore. When the semaphore is released, the kernel schedules the next thread waiting in the queue. However, scheduling the thread does not immediately make that thread start executing; it merely makes the thread able to execute by putting it in the queue of threads waiting to run. The thread will not actually run until a CPU has no threads of higher priority to execute.
While the thread is waiting in the scheduling queue, another thread that is actually executing may wait on the same semaphore. In a traditional queue system, that new thread would have to stop executing and go to the end of the queue waiting in line for that semaphore.
In recent Windows kernels, however, the new thread does not have to stop and wait for that semaphore. If the thread that has been assigned that semaphore is still sitting in the run queue, the semaphore may be reassigned to the old thread, causing the old thread to go back to waiting on the semaphore again.
The advantage of this is that the thread that was about to have to wait in the queue for the semaphore and then wait in the queue to run will not have to wait at all. The disadvantage is that you cannot predict which thread will actually get the semaphore next, and it's not fair so the thread waiting on the semaphore could potentially starve.
It is not that it CAN'T be FIFO; in fact, I'd bet many implementations ARE, for just the reasons that you state. The spec isn't that the process is chosen at random; it is that it isn't specified, so your program shouldn't rely on it being chosen in any particular way. (It COULD be chosen at random; just because it isn't the fastest approach doesn't mean it can't be done.)
All of the other answers here are great descriptions of the basic problem - especially around thread priorities and ready queues. Another thing to consider however is IO. I'm only talking about Windows here, since it is the only platform I know with any authority, but other kernels are likely to have similar issues.
On Windows, when an IO completes, something called a kernel-mode APC (Asynchronous Procedure Call) is queued against the thread which initiated the IO in order to complete it. If the thread happens to be waiting on a scheduler object (such as the semaphore in your example) then the thread is removed from the wait queue for that object which causes the (internal kernel mode) wait to complete with (something like) STATUS_ALERTED. Now, since these kernel-mode APCs are an implementation detail, and you can't see them from user mode, the kernel implementation of WaitForMultipleObjects restarts the wait at that point which causes your thread to get pushed to the back of the queue. From a kernel mode perspective, the queue is still in FIFO order, since the first caller of the underlying wait API is still at the head of the queue, however from your point of view, way up in user mode, you just got pushed to the back of the queue due to something you didn't see and quite possibly had no control over. This makes the queue order appear random from user mode. The implementation is still a simple FIFO, but because of IO it doesn't look like one from a higher level of abstraction.
I'm guessing a bit more here, but I would have thought that unix-like OSes have similar constraints around signal delivery and places where the kernel needs to hijack a process to run in its context.
Now this doesn't always happen, but the documentation has to be conservative and unless the order is explicitly guaranteed to be FIFO (which as described above - for windows at least - it can't be) then the ordering is described in the documentation as being "random" or "undocumented" or something because a random process controls it. It also gives the OS vendors lattitude to change the ordering at some later time.
Process scheduling algorithms are very specific to system functionality and operating system design. It will be hard to give a good answer to this question. If I am on a general PC, I want something with good throughput and average wait/response time. If I am on a system where I know the priority of all my jobs and know I absolutely want all my high priority jobs to run first (and don't care about preemption/starvation), then I want a Priority algorithm.
As far as a random selection goes, the motivation could be for various reasons. One being an attempt at good throughput, etc. as mentioned above above. However, it would be non-deterministic (hypothetically) and impossible to prove. This property could be an exploitation of probability (random samples, etc.), but, again, the proofs could only be based on empirical data on whether this would really work.
I'm trying to come up with a design for a thread pool with a lot of design requirements for my job. This is a real problem for working software, and it's a difficult task. I have a working implementation but I'd like to throw this out to SO and see what interesting ideas people can come up with, so that I can compare to my implementation and see how it stacks up. I've tried to be as specific to the requirements as I can.
The thread pool needs to execute a series of tasks. The tasks can be short running (<1sec) or long running (hours or days). Each task has an associated priority (from 1 = very low to 5 = very high). Tasks can arrive at any time while the other tasks are running, so as they arrive the thread pool needs to pick these up and schedule them as threads become available.
The task priority is completely independant of the task length. In fact it is impossible to tell how long a task could take to run without just running it.
Some tasks are CPU bound while some are greatly IO bound. It is impossible to tell beforehand what a given task would be (although I guess it might be possible to detect while the tasks are running).
The primary goal of the thread pool is to maximise throughput. The thread pool should effectively use the resources of the computer. Ideally, for CPU bound tasks, the number of active threads would be equal to the number of CPUs. For IO bound tasks, more threads should be allocated than there are CPUs so that blocking does not overly affect throughput. Minimising the use of locks and using thread safe/fast containers is important.
In general, you should run higher priority tasks with a higher CPU priority (ref: SetThreadPriority). Lower priority tasks should not "block" higher priority tasks from running, so if a higher priority task comes along while all low priority tasks are running, the higher priority task will get to run.
The tasks have a "max running tasks" parameter associated with them. Each type of task is only allowed to run at most this many concurrent instances of the task at a time. For example, we might have the following tasks in the queue:
A - 1000 instances - low priority - max tasks 1
B - 1000 instances - low priority - max tasks 1
C - 1000 instances - low priority - max tasks 1
A working implementation could only run (at most) 1 A, 1 B and 1 C at the same time.
It needs to run on Windows XP, Server 2003, Vista and Server 2008 (latest service packs).
For reference, we might use the following interface:
namespace ThreadPool
{
class Task
{
public:
Task();
void run();
};
class ThreadPool
{
public:
ThreadPool();
~ThreadPool();
void run(Task *inst);
void stop();
};
}
So what are we going to pick as the basic building block for this. Windows has two building blocks that look promising :- I/O Completion Ports (IOCPs) and Asynchronous Procedure Calls (APCs). Both of these give us FIFO queuing without having to perform explicit locking, and with a certain amount of built-in OS support in places like the scheduler (for example, IOCPs can avoid some context switches).
APCs are perhaps a slightly better fit, but we will have to be slightly careful with them, because they are not quite "transparent". If the work item performs an alertable wait (::SleepEx, ::WaitForXxxObjectEx, etc.) and we accidentally dispatch an APC to the thread then the newly dispatched APC will take over the thread, suspending the previously executing APC until the new APC is finished. This is bad for our concurrency requirements and can make stack overflows more likely.
It needs to run on Windows XP, Server 2003, Vista and Server 2008 (latest service packs).
What feature of the system's built-in thread pools make them unsuitable for your task? If you want to target XP and 2003 you can't use the new shiny Vista/2008 pools, but you can still use QueueUserWorkItem and friends.
#DrPizza - this is a very good question, and one that strikes right to the heart of the problem. There are a few reasons why QueueUserWorkItem and the Windows NT thread pool was ruled out (although the Vista one does look interesting, maybe in a few years).
Firstly, we wanted to have greater control over when it starts up and stops threads. We have heard that the NT thread pool is reluctant to start up a new thread if it thinks that the tasks are short running. We could use the WT_EXECUTELONGFUNCTION, but we really have no idea if the task is long or short
Secondly, if the thread pool was already filled up with long running, low priority tasks, there would be no chance of a high priority task getting to run in a timely manner. The NT thread pool has no real concept of task priorities, so we can't do a QueueUserWorkItem and say "oh by the way, run this one right away".
Thirdly, (according to MSDN) the NT thread pool is not compatible with the STA apartment model. I'm not sure quite what this would mean, but all of our worker threads run in an STA.
#DrPizza - this is a very good question, and one that strikes right to the heart of the problem. There are a few reasons why QueueUserWorkItem and the Windows NT thread pool was ruled out (although the Vista one does look interesting, maybe in a few years).
Yeah, it looks like it got quite beefed up in Vista, quite versatile now.
OK, I'm still a bit unclear about how you wish the priorities to work. If the pool is currently running a task of type A with maximal concurrency of 1 and low priority, and it gets given a new task also of type A (and maximal concurrency 1), but this time with a high priority, what should it do?
Suspending the currently executing A is hairy (it could hold a lock that the new task needs to take, deadlocking the system). It can't spawn a second thread and just let it run alongside (the permitted concurrency is only 1). But it can't wait until the low priority task is completed, because the runtime is unbounded and doing so would allow a low priority task to block a high priority task.
My presumption is that it is the latter behaviour that you are after?
#DrPizza:
OK, I'm still a bit unclear about how
you wish the priorities to work. If
the pool is currently running a task
of type A with maximal concurrency of
1 and low priority, and it gets given
a new task also of type A (and maximal
concurrency 1), but this time with a
high priority, what should it do?
This one is a bit of a tricky one, although in this case I think I would be happy with simply allowing the low-priority task to run to completion. Usually, we wouldn't see a lot of the same types of tasks with different thread priorities. In our model it is actually possible to safely halt and later restart tasks at certain well defined points (for different reasons than this) although the complications this would introduce probably aren't worth the risk.
Normally, only different types of tasks would have different priorities. For example:
A task - 1000 instances - low priority
B task - 1000 instances - high priority
Assuming the A tasks had come along and were running, then the B tasks had arrived, we would want the B tasks to be able to run more or less straight away.