Could someone explain this interesting behaviour with Sleep(1)? - c++

I was testing how long a various win32 API calls will wait for when asked to wait for 1ms. I tried:
::Sleep(1)
::WaitForSingleObject(handle, 1)
::GetQueuedCompletionStatus(handle, &bytes, &key, &overlapped, 1)
I was detecting the elapsed time using QueryPerformanceCounter and QueryPerformanceFrequency. The elapsed time was about 15ms most of the time, which is expected and documented all over the Internet. However for short period of time the waits were taking about 2ms!!! It happen consistently for few minutes but now it is back to 15ms. I did not use timeBeginPeriod() and timeEndPeriod calls! Then I tried the same app on another machine and waits are constantly taking about 2ms! Both machines have Windows XP SP2 and hardware should be identical. Is there something that explains why wait times vary by so much? TIA

Thread.Sleep(0) will let any threads of the same priority execute. Thread.Sleep(1) will let any threads of the same or lower priority execute.
Each thread is given an interval of time to execute in, before the scheduler lets another thread execute. As Billy ONeal states, calling Thread.Sleep will give up the rest of this interval to other threads (subject to the priority considerations above).
Windows balances over threads over the entire OS - not just in your process. This means that other threads on the OS can also cause your thread to be pre-empted (ie interrupted and the rest of the time interval given to another thread).
There is an article that might be of interest on the topic of Thread.Sleep(x) at:
Priority-induced starvation: Why Sleep(1) is better than Sleep(0) and the Windows balance set manager

Changing the timer's resolution can be done by any process on the system, and the effect is seen globally. See this article on how the Hotspot Java compiler deals with times on windows, specifically:
Note that any application can change the timer interrupt and that it affects the whole system. Windows only allows the period to be shortened, thus ensuring that the shortest requested period by all applications is the one that is used. If a process doesn't reset the period then Windows takes care of it when the process terminates. The reason why the VM doesn't just arbitrarily change the interrupt rate when it starts - it could do this - is that there is a potential performance impact to everything on the system due to the 10x increase in interrupts. However other applications do change it, typically multi-media viewers/players.

The biggest thing sleep(1) does is give up the rest of your thread's quantum . That depends entirely upon how much of your thread's quantum remains when you call sleep.

To aggregate what was said before:
CPU time is assigned in quantums (time slices)
The thread scheduler picks the thread to run. This thread may run for the entire time slice, even if threads of higher priority become ready to run.
Typical time slices are 8..15ms, depending on architecture.
The thread can "give up" the time slice - typically Sleep(0) or Sleep(1). Sleep(0) allows another thread of same or hogher priority to run for the next time slice. Sleep(1) allows "any" thread.
The time slice is global and can be affected by all processes
Even if you don't change the time slice, someone else could.
Even if the time slice doesn't change, you may "jump" between the two different times.
For simplicity, assume a single core, your thread and another thread X.
If Thread X runs at the same priority as yours, crunching numbers, Your Sleep(1) will take an entire time slice, 15ms being typical on client systems.
If Thread X runs at a lower priority, and gives up its own time slice after 4 ms, your Sleep(1) will take 4 ms.

I would say it just depends on how loaded the cpu is, if there arent many other process/threads it could get back to the calling thread a lot faster.

Related

Is it really impossible to suspend two std/posix threads at the same time?

I want to briefly suspend multiple C++ std threads, running on Linux, at the same time.
It seems this is not supported by the OS.
The threads work on tasks that take an uneven and unpredictable amount of time (several seconds).
I want to suspend them when the CPU temperature rises above a threshold.
It is impractical to check for suspension within the tasks, only inbetween tasks.
I would like to simply have all workers suspend operation for a few milliseconds.
How could that be done?
What I'm currently doing
I'm currently using a condition variable in a slim, custom binary semaphore class (think C++20 Semaphore).
A worker checks for suspension before starting the next task by acquiring and immediately releasing the semaphore.
A separate control thread occupies the control semaphore for a few milliseconds if the temperature is too high.
This often works well and the CPU temperature is stable.
I do not care much about a slight delay in suspending the threads.
However, when one task takes some seconds longer than the others, its thread will continue to run alone.
This activates CPU turbo mode, which is the opposite of what I want to achieve (it is comparatively power inefficient, thus bad for thermals).
I cannot deactivate CPU turbo as I do not control the hardware.
In other words, the tasks take too long to complete.
So I want to forcefully pause them from outside.
I want to suspend them when the CPU temperature rises above a threshold.
In general, that is putting the cart before the horse.
Properly designed hardware should have adequate cooling for maximum load and your program should not be able to exceed that cooling capacity.
In addition, since you are talking about Turbo, we can assume an Intel CPU, which will thermally throttle all on their own, making your program run slower without you doing anything.
In other words, the tasks take too long to complete
You could break the tasks into smaller parts, and check the semaphore more often.
A separate control thread occupies the control semaphore for a few milliseconds
It's really unlikely that your hardware can react to millisecond delays -- that's too short a timescale for anything thermal. You will probably be better off monitoring the temperature and simply reducing the number of tasks you are scheduling when the temperature is rising and getting close to your limits.
I've now implemented it with pthread_kill and SIGRT.
Note that suspending threads in unknown state (whatever the target task was doing at the time of signal receipt) is a recipe for deadlocks. The task may be inside malloc, may be holding arbitrary locks, etc. etc.
If your "control thread" also needs that lock, it will block and you lose. Your control thread must execute only direct system calls, may not call into libc, etc. etc.
This solution is ~impossible to test, and ~impossible to implement correctly.

Why does Sleep(500) cost more than 500ms?

I used Sleep(500) in my code and I used getTickCount() to test the timing. I found that it has a cost of about 515ms, more than 500. Does somebody know why that is?
Because Win32 API's Sleep isn't a high-precision sleep, and has a maximum granularity.
The best way to get a precision sleep is to sleep a bit less (~50 ms) and do a busy-wait. To find the exact amount of time you need to busywait, get the resolution of the system clock using timeGetDevCaps and multiply by 1.5 or 2 to be safe.
sleep(500) guarantees a sleep of at least 500ms.
But it might sleep for longer than that: the upper limit is not defined.
In your case, there will also be the extra overhead in calling getTickCount().
Your non-standard Sleep function may well behave in a different matter; but I doubt that exactness is guaranteed. To do that, you need special hardware.
As you can read in the documentation, the WinAPI function GetTickCount()
is limited to the resolution of the system timer, which is typically in the range of 10 milliseconds to 16 milliseconds.
To get a more accurate time measurement, use the function GetSystemDatePreciseAsFileTime
Also, you can not rely on Sleep(500) to sleep exactly 500 milliseconds. It will suspend the thread for at least 500 milliseconds. The operating system will then continue the thread as soon as it has a timeslot available. When there are many other tasks running on the operating system, there might be a delay.
In general sleeping means that your thread goes to a waiting state and after 500ms it will be in a "runnable" state. Then the OS scheduler chooses to run something according to the priority and number of runnable processes at that time. So if you do have high precision sleep and high precision clock then it is still a sleep for at least 500ms, not exactly 500ms.
Like the other answers have noted, Sleep() has limited accuracy. Actually, no implementation of a Sleep()-like function can be perfectly accurate, for several reasons:
It takes some time to actually call Sleep(). While an implementation aiming for maximal accuracy could attempt to measure and compensate for this overhead, few bother. (And, in any case, the overhead can vary due to many causes, including CPU and memory use.)
Even if the underlying timer used by Sleep() fires at exactly the desired time, there's no guarantee that your process will actually be rescheduled immediately after waking up. Your process might have been swapped out while it was sleeping, or other processes might be hogging the CPU.
It's possible that the OS cannot wake your process up at the requested time, e.g. because the computer is in suspend mode. In such a case, it's quite possible that your 500ms Sleep() call will actually end up taking several hours or days.
Also, even if Sleep() was perfectly accurate, the code you want to run after sleeping will inevitably consume some extra time.
Thus, to perform some action (e.g. redrawing the screen, or updating game logic) at regular intervals, the standard solution is to use a compensated Sleep() loop. That is, you maintain a regularly incrementing time counter indicating when the next action should occur, and compare this target time with the current system time to dynamically adjust your sleep time.
Some extra care needs to be taken to deal with unexpected large time jumps, e.g. if the computer was temporarily suspected or if the tick counter wrapped around, as well as the situation where processing the action ends up taking more time than is available before the next action, causing the loop to lag behind.
Here's a quick example implementation (in pseudocode) that should handle both of these issues:
int interval = 500, giveUpThreshold = 10*interval;
int nextTarget = GetTickCount();
bool active = doAction();
while (active) {
nextTarget += interval;
int delta = nextTarget - GetTickCount();
if (delta > giveUpThreshold || delta < -giveUpThreshold) {
// either we're hopelessly behind schedule, or something
// weird happened; either way, give up and reset the target
nextTarget = GetTickCount();
} else if (delta > 0) {
Sleep(delta);
}
active = doAction();
}
This will ensure that doAction() will be called on average once every interval milliseconds, at least as long as it doesn't consistently consume more time than that, and as long as no large time jumps occur. The exact time between successive calls may vary, but any such variation will be compensated for on the next interation.
Default timer resolution is low, you could increase time resolution if necessary. MSDN
#define TARGET_RESOLUTION 1 // 1-millisecond target resolution
TIMECAPS tc;
UINT wTimerRes;
if (timeGetDevCaps(&tc, sizeof(TIMECAPS)) != TIMERR_NOERROR)
{
// Error; application can't continue.
}
wTimerRes = min(max(tc.wPeriodMin, TARGET_RESOLUTION), tc.wPeriodMax);
timeBeginPeriod(wTimerRes);
There are two general reasons why code might want a function like "sleep":
It has some task which can be performed at any time that is at least some distance in the future.
It has some task which should be performed as near as possible to some moment in time some distance in the future.
In a good system, there should be separate ways of issuing those kinds of requests; Windows makes the first easier than the second.
Suppose there is one CPU and three threads in the system, all doing useful
work until, one second before midnight, one of the threads says it won't have
anything useful to do for at least a second. At that point, the system will
devote execution to the remaining two threads. If, 1ms before midnight,
one of those threads decides it won't have anything useful to do for at least
a second, the system will switch control to the last remaining thread.
When midnight rolls around, the original first thread will become available to
run, but since the presently-executing thread will have only had the CPU for
a millisecond at that point, there's no particular reason the original first
thread should be considered more "worthy" of CPU time than the other thread
which just got control. Since switching threads isn't free, the OS may very
well decide that the thread that presently has the CPU should keep it until
it blocks on something or has used up a whole time slice.
It might be nice if there were a version of "sleep" which were easier to use
than multi-media timers but would request that the system give the thread a
temporary priority boost when it becomes eligible to run again, or better yet
a variation of "sleep" which would specify a minimum time and a "priority-
boost" time, for tasks which need to be performed within a certain time window. I don't know of any systems that can be easily made to work that way, though.

Can I guarantee that Sleep() would not sleep for more than 10 ms?

I know that Sleep() is not accurate, but is there's a way to make it not sleep for more than 10 ms (i.e. only sleep between 1 ms and 10 ms)? Or does Sleep(1) already guarantee that?
If you really want guaranteed timings, you will not be using Windows at all.
To answer your question, Sleep() does not provide any means of guaranteeing an upper bound on the sleep time.
In windows, this is because Sleep() relinquishes the threads's time slice, and it is not guaranteed that the system scheduler will schedule the sleeping thread (i.e. allocate another time slice) to execute immediately after the sleep time is up. That depends on priorities of competing threads, scheduling policies, and things like that.
In practice, the actual sleep interval depends a lot on what other programs are running on the system, configuration of the system, whether other programs are accessing slow drives, etc etc.
With a lightly loaded system, it is a fair bet Sleep(1) will sleep between 1 and 2 ms on any modern (GHz frequency CPU or better). However, it is not impossible for your program to experience greater delays.
With a heavily loaded system (lots of other programs executing, using CPU and timer resources), it is a fair bet your program will experience substantially greater delays than 1ms, and even more than 10ms.
In short: no guarantees.
There is no way to guarantee it.
This is what real time OS are for.
In general case if your OS doesn't experience high loads sleep will be pretty accurate but as you increase load on it the more inaccurate it will get.
No. Or, yes, depending on your perspective.
According to the documentation:
After the sleep interval has passed, the thread is ready to run. If
you specify 0 milliseconds, the thread will relinquish the remainder
of its time slice but remain ready. Note that a ready thread is not
guaranteed to run immediately. Consequently, the thread may not run
until some time after the sleep interval elapses. For more
information, see Scheduling Priorities.
What this means is that the problem isn't Sleep. Rather, when Sleep ends, your thread may still need to wait to become active again.
You cannot count on 10 milliseconds, that's too low. Sleep() accuracy is affected by:
The clock tick interrupt frequency. In general, the processor tends to be in a quiescent state, not consuming any power and turned off by the HLT instruction. It is dead to the world, unaware that time is passing and unaware that your sleep interval has expired. A periodic hardware interrupt generated by the chipset wakes it up and makes it pay attention again. By default, this interrupt is generated 64 times per second. Or once every 15.625 milliseconds.
The thread scheduler runs at every clock interrupt. It is the one that notices that your sleep interval has expired, it will put the thread back into the ready-to-run state. And boosts its priority so that it is more likely to acquire a processor core. It will do so when no other threads with higher priority are ready to run.
There isn't much you can do about the 2nd bullet, you have to compete with everybody else and take your fair share. If the thread does a lot of sleeping and little computation then it is not unreasonable to claim more than your fair share, call SetThreadPriority() to boost your base priority and make it more likely that your sleep interval is accurate. If that isn't good enough then the only way to claim a high enough priority that will always beat everybody else is by writing ring 0 code, a driver.
You can mess with the 1st bullet, it is pretty common to do so. Also the reason why many programmers think that the default accuracy is 10 msec. Or if they use Chrome that it might be 1 msec, that browser jacks up the interrupt rate sky-high. A fairly unreasonable thing to do, bad for power consumption, unless you are in the business of making your mobile operating system products look good :)
Call timeBeginPeriod before you need to make your sleep intervals short enough, timeEndPeriod() when you're done. Use NtSetTimerResolution() if you need to go lower than 1 msec.
Sleep won't guarantee that.
The only way I know of doing that is to have a thread wait for a fast timer event and free a synchronization object every 10 ms or so.
You will pass a semaphore to this "wait server task", and it will free it on the next timer tick, thus giving you a response time between 0 and 10 ms.
Of couse if you want an extreme precision you will have to boost this thread priority above other tasks that might preempt it, and at any rate you might still be preempted by system processes and/or interrupt handlers, which will add some noise to your timer.

Linux, need accurate program timing. Scheduler wake up program

I have a thread running on a Linux system which i need to execute in as accurate intervals as possbile. E.g. execute once every ms.
Currently this is done by creating a timer with
timerfd_create(CLOCK_MONOTONIC, 0)
, and then passing the desired sleep time in a struct with
timerfd_settime (fd, 0, &itval, NULL);
A blocking read call is performed on this timer which halts thread execution and reports lost wakeup calls.
The problem is that at higher frequencies, the system starts loosing deadlines, even though CPU usage is below 10%. I think this is due to the scheduler not waking the thread often enough to check the blocking call. Is there a command i can use to tell the scheduler to wake the thread at certain intervals as far as it is possble?
Busy-waiting is a bad option since the system handles many other tasks.
Thank you.
You need to get RT linux*, and then increase the RT priority of the process that you want to wake up at regular intervals.
Other then that, I do not see problems in your code, and if your process is not getting blocked, it should work fine.
(*) RT linux - an os with some real time scheduling patches applied.
One way to reduce scheduler latency is to run your process using the realtime scheduler such as SCHED_FIFO. See sched_setscheduler .
This will generally improve latency a lot, but still theres little guarantee, to further reduce latency spikes, you'll need to move to the realtime brance of linux, or a realtime OS such as VxWorks, RTEMS or QNX.
You won't be able to do what you want unless you run it on an actual "Real Time OS".
If this is only Linux for x86 system I would choose HPET timer. I think all modern PCs has this hardware timer build in and it is very, very accurate. I allow you to define callback that will be called every millisecond and in this callback you can do your calculations (if they are simple) or just trigger other thread work using some synchronization object (conditional variable for example)
Here is some example how to use this timer http://blog.fpmurphy.com/2009/07/linux-hpet-support.html
Along with other advice such as setting the scheduling class to SCHED_FIFO, you will need to use a Linux kernel compiled with a high enough tick rate that it can meet your deadline.
For example, a kernel compiled with CONFIG_HZ of 100 or 250 Hz (timer interrupts per second) can never respond to timer events faster than that.
You must also set your timer to be just a little bit faster than you actually need, because timers are allowed to go beyond their requested time but never expire early, this will give you better results. If you need 1 ms, then I'd recommend asking for 999 us instead.

Why does an empty loop use so much processor time?

If I have an empty while loop in my code such as:
while(true);
It will drive the processor usage up to about 25%. However if I do the following:
while(true)
Sleep(1);
It will only use about 1%.
So why is that?
Update: Thanks for all the great replies, but I guess I really should have asked this question, What's the algorithm behind sleep()? which is more of want I wanted to know.
With the former, the condition true must be checked by the processor as often as the application can possibly get focus. As soon as the application gets processor attention, it checks true, just like any other loop condition, to see if the next instruction in the loop can be executed. Of course, the next instruction in the loop is also true. Basically, you are forcing the processor to constantly determine whether or not true is true, which, although trivial, when performed constantly bogs the processor down.
In the latter, the processor checks true once, and then executes the next statement. In this case, that is a 1ms wait. Since 1 ms is far greater than the amount of time required to check true, and the processor knows it can do other things during this wait, you free up a lot of power.
I am going to guess that you have four cores on your multicore processor, which would explain the 25%, as you are completely tying up one processor doing your busy loop, as the only break in it is when the application is delayed so another application can run (but that may not happen depending on the load).
When you put a thread to sleep then it allows the OS to do other operations, and it knows when, as a minimum, to come back and wake up the thread, so it can continue it's work.
The first continuously uses CPU operations. The latter switches the context of the currently running thread, putting is in sleep mode, thus allowing for other processes to be scheduled.
You have a quad-core machine, am I right? If so,
while(true);
is actually using 100% of one of your cores.
To the operating system, it seems your program has a lot of work to do. So the operating system lets the program go ahead with this work. It can't tell the difference between your program number crunching like crazy, and doing a useless infinite loop.
Sleep(1);
on the other hand explicitly tells the operating system that your have no work to do for the next millisecond. The OS will thus stop running your program and let other programs do work.
An empty loop isn't actually empty. A loop in itself is at least a comparison and a jump back to the comparison. Modern CPUs can do millions of these operations per second.
The sleep statement in the second loop relinquishes control to the operating system for at least 1 millisecond. In this state, the application is effectively halted and does not continue processing. The result of halting for x amount of time reduces the number of comparisons, and hence the % of cpu clock cycles the cpu can execute per second.
Concerning the 25%, Intel processors that support Hyperthreading or multi core processors might taint the performance statistics. The empty loop is effectively topping off at least one processor core.
Back in the day when multicore CPUs didn't exist, the users did have the need for multi processing/tasking. There are a couple of ways the illusion of running multiple processes at the same time was achieved.
One way was to design application in such a way that they needed to relinquish control to the system ever so often, as to let other processes run for a while. This was the case in the old Windows versions. If a given application was badly designed so that it didn't relinquish control, or got stuck in an endless loop, your entire PC effectively froze up.
Needless to say this wasn't the best way, and it was replaced by preemptive multitasking. Here a Programmable Interrupt Timer is instructed to interrupt the running process at a given interval to execute some scheduler code that lets the other processes have a go.
Basically, you've got several "process scheduler" states. I'll name three of them.
One: Ready
Two: Running
Three: Blocked
These states / queues only exist because of the limited number of cores on your processor. In Ready, processes are scheduled that are totally ready for execution. They don't have to wait for input, time, or whatever. On Running, processes actually "have" the processor and thus ARE running. State Blocked means your process is waiting for an event to happen before queueing for the processor.
When you keep on testing for while(true) you keep your process in the "ready" queue. Your process scheduler gives it a certain amount of time, and after a while, removes it from the processor (placing it on the back of the "ready" queue). And thus your process will keep coming back "on" the processor, keeping it busy.
When you execute a "sleep" statement, your process will not be scheduled on process until the prerequisity is fulfilled - in this particular case, as long as the time passed after the "sleep" command <= 1000 ms.
Sleep() is not really doing anything during the period that the thread is sleeping. It hands its time over to other processes to use. A loop on the other hand is continuously checking to see if the condition is true or false.
Because the Sleep is basically telling the processor to switch contexts and let some other programs get more CPU time.
The sleep in the second one is kinda like a 'yield' to the OS's process scheduler.
Because you're keeping the processor busy evaluating the loop over annd over.
Using Sleep actually lets other threads execute on the CPU and along with a very short context switch looks as if the CPU is free for a short while.
A cpu can do some billion operations per second. This means the empty loop runs mybe one million times per second. The loop with the sleep statement runs only 1000 times per second. In this case the cpu has some operations per second left to do other things.
Say we have a 3GHz cpu. 3Ghz = 3 000 000 000Hz - The cpu can run the loop three bilion times a second (simplyfied).
With the sleep statement the loop is executed 1000 times a second. This means the cpu load is
1000 / 3 000 000 000 * 100 = 0.0001%
Because that would run instructions all the time.
Normal programs don't madly run instructions all the time.
For instance, GUI programs just sit idle waiting for events, (such as keyboard input),
Note: sitting idle != while(true);
They only run instructions when events arrive, and the event-handling code is usually small and runs very quickly, (otherwise, the program will appear to be not-responding). Imagine your app gets 5 keystrokes per second, how much CPU time would it take?
That's why normal processes don't take that much CPU.
Now, earlier I said that sitting idle is not the same as an infinite empty loop. Why is that? Sitting idle means telling the OS that you don't have anything to run.
An infinite loops actually is something to run (a repeating jump instruction).
On the other hand, having nothing to run means basically that the OS won't even bother giving you any processor time at all, even if it's your turn.
Another example where programs sit idle is loading files: when you need to load a file, you basically send a signal to the disk and wait for it to find the data and load it into memory. While the data is being loaded (several milliseconds), the process just sits idle, doing nothing.
Yet another instance of a process sitting idle, is Sleep(1), here it's explicitly telling the OS not to give it any cpu-time before the specified time has passed.