Does Sleep(n) with n>0 relinquish CPU time to other threads - c++

Using VC++ 13 under windows, the on-line help states that using Sleep(0) relinquishes the remainder of the current threads time slice to any other thread of equal priority. Is this also the case for other values? e.g. if I use Sleep(1000) are 1000ms of CPU time for the core on which the current thread is running likely to be usable by another thread? I imagine this is hardware and implementation specific, so to narrow it assume Intel I5 or better, Windows 7 or 8.
The reason for asking is I have a thread pool class, and I'm using an additional monitor thread to report progress, allow the user to abort long processes, etc...

Yes, zero has the special meaning only in the regard to signal there is no minimal time to wait. Normally it could be interpreted like "I want to sleep for no-time" which doesn't make much sense. It means "I want to give chance to other thread to run."
If it's non-zero, thread is guaranteed not to be returned to for the amount of time specified, of course within the clock resolution. When thread gets suspended it gets a suspended status in the system and is not considered during scheduling. With 0 it doesn't change it's status, so it remains ready to run, and the function might return immediately.
Also, I don't think it is hardware related, this is purely system level thing.

MSDN: Sleep function
A value of zero causes the thread to relinquish the remainder of its time slice to any other thread that is ready to run. If there are no other threads ready to run, the function returns immediately, and the thread continues execution.
The special XP case is described as follows:
Windows XP: A value of zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run. If there are no other threads of equal priority ready to run, the function returns immediately, and the thread continues execution. This behavior changed starting with Windows Server 2003.
MSDN states that the reminder of the threads time slice is relinquished to any other thread of equal priority. This is somewhat meaningless because a thread of higher priority would have been scheduled prior to the thread calling Sleep(0) and a thread with lower priority would cause the Sleep(0) to return immediately without giving anything away. Therefore Sleep(0) only has impact to threads of equal priority by default.
Purpose of Sleep(0): It triggers the scheduler to re-schedule while putting the calling thread at the end of the queue. If the queue does not have any other processes of the same priority, the call will return immediately. If there are other threads, the delay is undetermined. Note: The Windows scheduler is not a single thread, it is spread all over the OS (Details: How does a scheduler regain control when wanted?).
The detailed behavior depends on the systems timer resolution setting (How to get the current Windows system-wide timer resolution). This setting also influences the threads time slice, it varies with the system timer resolution.
The system timer resolution defines the heartbeat of the system. This causes thread quanta to have specific values. The timer resolution granularity also determines the resolution of Sleep(period). Consequently, the accuracy of sleep periods is determined by the systems heartbeat. However, a high resolution timer setting increases the power consumption.
A Sleep(period) with period > 0 triggers the scheduler and prohibits scheduling of the calling thread for at least the requested period.
Consequently the calling threads time slice is interrupted. It ends immediately.
Yes, Sleep(period) with period > 0 relinquishes CPU time to other threads (if any applicable).
(Further reading: Few words about timer resolution, How to get an accurate 1ms Timer Tick under WinXP, and Limits of Windows Queue Timers).

Related

How hardware timers work and affect on software performance?

I want to use async functions calling. I chose boost::deadline_timer.
For me, hardware-timer is a specific hardware (surprisingly), that works independently from CPU and is duty only for monitoring time. At the same time, if I understand correctly, it is also can be used for setting timeout and generating an interrupt when the timeout has been reached. (timers)
The primary advantage of that is asynchronous execution. The thread that set a timer can continue working and the callback function will be triggered in the same thread where the timer has been set.
Let me describe as I see it in action.
The application contains one or more worker threads. E.g. they process input items and filter them. Let's consider that application has 5 threads and each thread set one timer (5 seconds).
Application is working. E.g. current thread is thread-3.
Timer (thread-0) has been expired and generates (probably the wrong term) an interrupt.
Thread-context switching (thread-3 -> thread-0);
Callback function execution;
Timer (thread-1) has been expired and generates interruption.
...
And so on
P.S.0. I understand that this is not only one possible case for multi-threaded application.
Questions:
Did I describe the working process rightly?
Do I understand correctly that even current thread is thread-0 it also leads to context-switching, since the thread has to stop to execute current code and switch to execute the code from callback fuction?
If each thread sets 100k or 500k timers how it will affect on performance?
Does hardware have the limit to count of timers?
How expensive to update the timeout for a timer?
A hardware timer is, at its core, just a count-up counter and a set of comparators (or a count-down counter that uses the borrow of the MSb as an implicit comparison with 0).
Picture it as a register with a specialized operation Increment (or Decrement) that is started at every cycle of a clock (the easiest kind of counter with this operation is the Ripple-counter).
Each cycle the counter value is also fed to the comparator, previously loaded with a value, and its output will be the input to the CPU (as an interrupt or in a specialized pin).
In the case of a count-down counter, the borrow from the MSb acts as the signal that the value rolled over zero.
These timers have usually more functions, like the ability to stop after they reach the desired value (one-shot), to reset (periodic), to alternate the output state low and high (square wave generator), and other fancy features.
There is no limit on how many timers you can put on a package, of course, albeit simple circuits, they still have a cost in terms of money and space.
Most MCUs have one or two timers, when two, the idea is to use one for generic scheduling and the other for high-priority tasks orthogonal to the OS scheduling.
It's worth noting that having many hardware timers (to be used by the software) is useless unless there are also many CPUs/MCUs since it's easier to use software timers.
On x86 the HPET timer is actually made of at most 32 timers, each with 8 comparators, for a total of 256 timers as seen from the software POV.
The idea was to assign each timer to a specific application.
Applications in an OS don't use the hardware timers directly, because there can possibly be a lot of applications but just one or two timers.
So what the OS does is share the timer.
It does this by programming the timer to generate an interrupt every X units of time and by registering an ISR (Interrupt Service Routine) for such an event.
When a thread/task/program sets up a timer, the OS appends the timer information (periodic vs one-shot, period, ticks left, and callback) to a priority queue using the absolute expiration time as the key (see Peter Cordes comments below) or a list for simple OSes.
Each time the ISR is called the OS will peek at the queue and see if the element on top is expired.
What happens when a software timer is expired is OS-dependent.
Some embedded and small OS may call the timer's callback directly from the context of the ISR.
This is often true if the OS doesn't really have a concept of thread/task (and so of context switch).
Other OSes may append the timer's callback to a list of "to be called soon" functions.
This list will be walked and processed by a specialized task. This is how FreeRTOS does it if the timer task is enabled.
This approach keeps the ISR short and allows programming the hardware timer with a shorter period (in many architectures interrupts are ignored while in an ISR, either by the CPU automatically masking interrupts or by the interrupt controller).
IIRC Windows does something similar, it posts an APC (Async Procedure Call) in the context of the thread that set the software timer just expired. When the thread is scheduled the APC will (as a form of a window's message or not, depending on the specific API used) call the callback. If the thread was waiting on the timer, I think it is just set in the ready state. In any case, it's not scheduled right away but it may get a priority boost.
Where the ISR will return is still OS-dependent.
An OS may continue executing the interrupted thread/task until it's scheduled out. In this case, you don't have step 4 immediately after step 3, instead, thread3 will run until its quantum expires.
On the other way around, an OS may signal the end of the ISR to the hardware and then schedule the thread with the callback.
This approach doesn't work if two or more timers expire in the same tick, so a better approach would be to execute a rescheduling, letting the schedule pick up the most appropriate thread.
The scheduling may also take into account other hints given by the thread during the creation of the software timer.
The OS may also just switch context, execute the callback and get back to the ISR context where it continues peeking at the queue.
The OS may even do any of that based on the period of the timer and other hints.
So it works pretty much like you imagined, except that a thread may not be called immediately upon the timer's expiration.
Updating a timer is not expensive.
While all in all the total work is not much, the timer ISR is meant to be called many many times a second.
In fact, I'm not even sure an OS will allow you to create such a huge number (500k) of timers.
Windows can manage a lot of timers (and their backing threads) but probably not 500k.
The main problem with having a lot of timers is that even if each one performs little work, the total work performed may be too much to keep up with the rate of ticking.
If each X units (e.g. 1ms) of time 100 timers expire, you have X/100 units of time (e.g. 10us) to execute each callback and the callback's code may just be too long to execute in that slice of time.
When this happens the callbacks will be called less often than desired.
More CPU/cores will allow some callback to execute in parallel and would alleviate the pressure.
In general, you need different timers if they run at different rates, otherwise, a single timer that walks a data structure filled with elements of work/data is fine.
Multi-threading can provide concurrency if your tasks are IO-bounded (files, network, input, and so on) or parallelism if you have a multi-processor system.

Accurate Sleep with cancellation

I need to implement a delay or sleep function that is accurate and consistent, and must be able to be cancelled.
Here's my code:
bool cancel_flag(false);
void My_Sleep(unsigned int duration)
{
static const size_t SLEEP_INTERVAL = 10U; // 10 milliseconds
while ((!cancel_flag) && (duration > SLEEP_INTERVAL))
{
Sleep(duration);
duration -= SLEEP_INTERVAL;
}
if ((!cancel_flag) && (duration > 0U))
{
Sleep(duration);
}
}
The above function is run in a worker thread. The main thread is able to change the value of the "cancel_flag" in order to abort (cancel) the sleeping.
At my shop, we have different results when the duration is 10 seconds (10000 ms). Some PCs are showing a sleep duration of 10 seconds, other PCs are showing 16 seconds.
Articles about the Sleep() function say that it is bound to the windows interrupt and when the duration elapses, the thread is rescheduled (may not be run immediately). The function above may be encountering a propagation of time error due to rescheduling and interrupt latency.
The Windows Timestamp Project describes another technique of waiting on a timer object. My understanding is that this technique doesn't provide a means of cancellation (by another, main, thread).
Question:
1. How can I improve my implementation for a thread delay or sleep, that can be cancelled by another task, and is more consistent?
Can a sleeping AFX thread be terminated?
(What happens when a main thread terminates a sleeping AFX thread?)
What happens when a main thread terminates a thread that has called WaitForSingleObject?
Accuracy should be around 10ms, as in 10 seconds + 10ms.
Results should be consistent across various PCs (all running Windows 7 or 10).
Background
PCs that have correct timings are running Windows 7, at 2.9 GHz.
The PCs that have incorrect timings are running Windows 7, at 3.1 GHz and have fewer simultaneous tasks and applications running.
Application is developed using Visual Studio 2017, and MFC framework (using AFX for thread creation).
You shouldn't be implementing this at all.
In C++11 all basic necessary utilities for multithreading are implemented in the standard.
If you do not use C++11 - then switch to C++11 or higher - in the unfortunate case that you cannot, then use Boost which has the same features.
Basically, what you want to do with this functionality is covered by std::condition_variable. You can put a thread into waiting mode by using function wait (it accepts a condition function necessary for leaving the wait), wait_for and wait_until (same as wait but with total waiting time limit) as well as notify_one and notify_all methods that wake the sleeping threads (one or all) and make them check the awakening condition and proceed with their tasks.
Check out std::conditional_variable in the reference. Just google it and you'll find enough information about it with examples.
In case you do not trust std::conditional_variable implementation for some reason, you can still utilize it for mini waits and awakening.
How can I improve my implementation for a thread delay or sleep, that
can be cancelled by another task, and is more consistent?
High precision sleep has been discussed here before. I have used a waitable timer approach similar to what's described here.
Can a sleeping AFX thread be terminated? (What happens when a main
thread terminates a sleeping AFX thread?)
I assume you mean terminate with TerminateThread? AFX threads are simply wrappers around standard Windows threads. There is nothing special or magical about them that would differentiate their behavior. So what would happen is a bunch of bad stuff:
If the target thread owns a critical section, the critical section will not be released.
If the target thread is allocating memory from the heap, the heap lock will not be released.
If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent.
If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL.
You should never have to do this if you have access to all the source code of the application.
What happens when a main thread terminates a thread that has called
WaitForSingleObject?
See previous bullet. CreateEvent and WaitForSingleObject are actually a recommended way of cleanly terminating threads.

Event respond faster than semaphore?

In a project I run into a case like this (On windows 7),
When several threads are busy (all my CPU cores are busy working), there'll be delay for a thread
to receive a semaphore (which is increased from 0 to 1). It may be as long as 1.5ms.
I solve this by cache a little things and increase the semaphore value earlier.
So to me, it seems signaling a semaphore is slow, it's not immediately received by threads (especially when CPU are busy), but if you signal it earlier before some thread begin to wait on it,, there' be no delay.
I once thought event is just a semaphore with maximum value of 1,,, well, now having met this case, I'm beginning to wonder if event is faster than semaphore at noticing threads to 'wake up'.
Sorry, I tried, but didn't come out with a demo,, I'm not very good at threading yet.
EDIT:
Is it true that Event is faster than Semaphore on Windows?
1.5 milliseconds is not explained by just the overhead between different multithreading primitives.
To simplify, Threads have three states
blocked
runnable
running
If a thread is waiting on a semaphore or an event, then it's blocked. When the event is signalled, it becomes runnable.
So the real question is, "When does a runnable thread actually run?" This varies according to scheduler algorithms, etc, but obviously it needs to run on a core, and that means nothing else can be "running" on that core at the same time. The scheduler will normally 'remove' the current running thread from a core when one of the following happens
it waits on a semaphore/event, and so becomes 'blocked'
It's been running continually for a certain time (time-based, or round-robin scheduling)
A higher priority thread becomes runnable.
The 1.5 milliseconds is probably round-robin, or time-based scheduling. Your thread is runnable but just hasn't started yet. If the thread must start, and should boot out the current thread, then you can try to increase it's priority via SetThreadPriority
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx
If a thread is waiting on a semaphore and it gets signaled, the thread will in my limited testing, become running in ~10us on a box that is not overloaded.
Signaling, and subsequent dispatching onto a core, will take longer if:
The signaled thread is in another process than any thread is preempts.
Running the signaled thread requires a thread running on another core to be preempted.
The box is already overloaded with higher-priority threads.
1.5ms must represent an extreme case where your box is very busy.
In such a case, replacing the semaphore with an event is unlikely to result in any significant improvement to overall signaling latency because the bulk of the work/delay required by the inter-thread signaling is tied up the in scheduling/dispatching, which is required in either case.

Significance of Sleep(0)

I used to see Sleep(0) in some part of my code where some infinite/long while loops are available. I was informed that it would make the time-slice available for other waiting processes. Is this true? Is there any significance for Sleep(0)?
According to MSDN's documentation for Sleep:
A value of zero causes the thread to
relinquish the remainder of its time
slice to any other thread that is
ready to run. If there are no other
threads ready to run, the function
returns immediately, and the thread
continues execution.
The important thing to realize is that yes, this gives other threads a chance to run, but if there are none ready to run, then your thread continues -- leaving the CPU usage at 100% since something will always be running. If your while loop is just spinning while waiting for some condition, you might want to consider using a synchronization primitive like an event to sleep until the condition is satisfied or sleep for a small amount of time to prevent maxing out the CPU.
Yes, it gives other threads the chance to run.
A value of zero causes the thread to
relinquish the remainder of its time
slice to any other thread that is
ready to run. If there are no other
threads ready to run, the function
returns immediately, and the thread
continues execution.
Source
I'm afraid I can't improve on the MSDN docs here
A value of zero causes the thread to
relinquish the remainder of its time
slice to any other thread that is
ready to run. If there are no other
threads ready to run, the function
returns immediately, and the thread
continues execution.
Windows XP/2000: A value of zero
causes the thread to relinquish the
remainder of its time slice to any
other thread of equal priority that is
ready to run. If there are no other
threads of equal priority ready to
run, the function returns immediately,
and the thread continues execution.
This behavior changed starting with
Windows Server 2003.
Please also note (via upvote) the two useful answers regarding efficiency problems here.
Be careful with Sleep(0), if one loop iteration execution time is short, this can slow down such loop significantly. If this is important to use it, you can call Sleep(0), for example, once per 100 iterations.
Sleep(0); At that instruction, the system scheduler will check for any other runnable threads and possibly give them a chance to use the system resources depending on thread priorities.
On Linux there's a specific command for this: sched_yield()
as from the man pages:
sched_yield() causes the calling thread to relinquish the CPU. The
thread is moved to the end of the queue for its static priority and a
new thread gets to run.
If the calling thread is the only thread in the highest priority list
at that time, it will continue to run after a call to sched_yield().
with also
Strategic calls to sched_yield() can improve performance by giving
other threads or processes a chance to run when (heavily) contended
resources (e.g., mutexes) have been released by the caller. Avoid
calling sched_yield() unnecessarily or inappropriately (e.g., when
resources needed by other schedulable threads are still held by the
caller), since doing so will result in unnecessary context switches,
which will degrade system performance.
In one app....the main thread looked for things to do, then launched the "work" via a new thread. In this case, you should call sched_yield() (or sleep(0)) in the main thread, so, that you do not make the "looking" for work, more important then the "work". I prefer sleep(0), but sometimes this is excessive (because you are sleeping a fraction of a second).
Sleep(0) is a powerful tool and it can improve the performance in certain cases. Using it in a fast loop might be considered in special cases. When a set of threads shall be utmost responsive, they shall all use Sleep(0) frequently. But it is crutial to find a ruler for what responsive means in the context of the code.
I've given some details in https://stackoverflow.com/a/11456112/1504523
I am using using pthreads and for some reason on my mac the compiler is not finding pthread_yield() to be declared. But it seems that sleep(0) is the same thing.

Could someone explain this interesting behaviour with Sleep(1)?

I was testing how long a various win32 API calls will wait for when asked to wait for 1ms. I tried:
::Sleep(1)
::WaitForSingleObject(handle, 1)
::GetQueuedCompletionStatus(handle, &bytes, &key, &overlapped, 1)
I was detecting the elapsed time using QueryPerformanceCounter and QueryPerformanceFrequency. The elapsed time was about 15ms most of the time, which is expected and documented all over the Internet. However for short period of time the waits were taking about 2ms!!! It happen consistently for few minutes but now it is back to 15ms. I did not use timeBeginPeriod() and timeEndPeriod calls! Then I tried the same app on another machine and waits are constantly taking about 2ms! Both machines have Windows XP SP2 and hardware should be identical. Is there something that explains why wait times vary by so much? TIA
Thread.Sleep(0) will let any threads of the same priority execute. Thread.Sleep(1) will let any threads of the same or lower priority execute.
Each thread is given an interval of time to execute in, before the scheduler lets another thread execute. As Billy ONeal states, calling Thread.Sleep will give up the rest of this interval to other threads (subject to the priority considerations above).
Windows balances over threads over the entire OS - not just in your process. This means that other threads on the OS can also cause your thread to be pre-empted (ie interrupted and the rest of the time interval given to another thread).
There is an article that might be of interest on the topic of Thread.Sleep(x) at:
Priority-induced starvation: Why Sleep(1) is better than Sleep(0) and the Windows balance set manager
Changing the timer's resolution can be done by any process on the system, and the effect is seen globally. See this article on how the Hotspot Java compiler deals with times on windows, specifically:
Note that any application can change the timer interrupt and that it affects the whole system. Windows only allows the period to be shortened, thus ensuring that the shortest requested period by all applications is the one that is used. If a process doesn't reset the period then Windows takes care of it when the process terminates. The reason why the VM doesn't just arbitrarily change the interrupt rate when it starts - it could do this - is that there is a potential performance impact to everything on the system due to the 10x increase in interrupts. However other applications do change it, typically multi-media viewers/players.
The biggest thing sleep(1) does is give up the rest of your thread's quantum . That depends entirely upon how much of your thread's quantum remains when you call sleep.
To aggregate what was said before:
CPU time is assigned in quantums (time slices)
The thread scheduler picks the thread to run. This thread may run for the entire time slice, even if threads of higher priority become ready to run.
Typical time slices are 8..15ms, depending on architecture.
The thread can "give up" the time slice - typically Sleep(0) or Sleep(1). Sleep(0) allows another thread of same or hogher priority to run for the next time slice. Sleep(1) allows "any" thread.
The time slice is global and can be affected by all processes
Even if you don't change the time slice, someone else could.
Even if the time slice doesn't change, you may "jump" between the two different times.
For simplicity, assume a single core, your thread and another thread X.
If Thread X runs at the same priority as yours, crunching numbers, Your Sleep(1) will take an entire time slice, 15ms being typical on client systems.
If Thread X runs at a lower priority, and gives up its own time slice after 4 ms, your Sleep(1) will take 4 ms.
I would say it just depends on how loaded the cpu is, if there arent many other process/threads it could get back to the calling thread a lot faster.