Sleep() becomes less accurate after replacing a PC? (C++) - c++

I have a program that was built in C++ (MFC, Visual Studio 6.0) several years ago and has been running on a certain Windows machine for quite some time (more than 5 years). The PC was replaced a month ago (the old one died), and since then the program's timing behavior changed. I need help understanding why.
The main functionality of the program is to respond to keystrokes by sending out ON and OFF signals to an external card, with very accurate delay between the ON and the OFF. An example program flow:
> wait for keystroke...
> ! keystroke occurred
> send ON message
> wait 150ms
> send OFF message
Different keystrokes have different waiting periods associated with them, between 20ms and 150ms (a very deterministic time depending on the specific keystroke). The timing is very important. The waiting is executed using simple Sleep(). The accuracy of the sleep on the old PC was 1-2ms deviation. I can measure the timing externally to the computer (on the external card), so my measurement of the sleep time is very accurate. Please take into account this machine executed such ON-sleep-OFF cycles thousands of times a day for years, so the accuracy data I have is sound.
Since the PC was replaced the timing deviation is more than 10ms.
I did not install the previous PC, so it may had some additional software packages installed. Also, I'm ashamed to admit I don't remember whether the previous PC was Windows 2000 or Windows XP. I'm quite sure it was XP, but not 100% (and I can't check now...). The new one is Windows XP.
I tried changing the sleeping mechanism to be based on timers, but the accuracy did not improve.
Can anything explain this change? Is there a software package that may have been installed on the previous PC that may fix the problem? Is there a best practice to deal with the problem?

The time resolution on XP is around 10ms - the system basically "ticks" every 10ms. Sleep is not a very good way to do accurate timing for that reason. I'm pretty sure win2000 has the same resolution but if I'm wrong that could be a reason.
You can change that resolution , atleast down to 1ms - see http://technet.microsoft.com/en-us/sysinternals/bb897569.aspx or use this http://www.lucashale.com/timerresolution/ - there's probably a registry key as well(windows media player will change that timer as well, probably only while it's running.
Could be the resolution somehow was altered on your old machine.

If your main concern is precision, consider using spinlock. Sleep() function is a hint for the scheduler to not to re-schedule the given thread for at least x ms, there's no guarantee that the thread will sleep exactly for the time specified.

Usually Sleep() will result in delay of ~15 ms or period multiple by ~15ms depending on sleep value.
On of the good ways to find out haw it works is the following pseudo-code:
while true do
print(GetTickCount());
Sleep(1);
end;
And also it will show that the behavior of this code is different for, say, Windows XP and Vista/Win 7

As others have mentioned, sleep has coarse accuracy.
I typically use Boost::asio for this kind of timing:
// Set up the io_service and deadline_timer
io_service io_
deadline_timer timer(io_service);
// Configure the wait period
timer.expires_from_now(posix_time::millisec(5));
timer.wait();
Asio uses the most effective implementation for your platform; on Windows I believe it uses overlapped IO.
If I set the time period to 1ms and loop the "timer." calls 10000 times the duration is typically about 10005-10100 ms. Very accurate, cross platform code (though accuracy is different on Linux) and very easy to read.
I can't explain why your previous PC was so accurate though; Sleep has been +/- 10ms whenever I've used it - worse if the PC is busy.

Is your new PC multi-core and the old one single-core? The difference in timing accuracy may be the use of multiple threads and context switching.

Sleep is dependent on the system clock. Your new machine probably has a different timing than your previous machine. From the documentation:
This function causes a thread to
relinquish the remainder of its time
slice and become unrunnable for an
interval based on the value of
dwMilliseconds. The system clock
"ticks" at a constant rate. If
dwMilliseconds is less than the
resolution of the system clock, the
thread may sleep for less than the
specified length of time. If
dwMilliseconds is greater than one
tick but less than two, the wait can
be anywhere between one and two ticks,
and so on. To increase the accuracy of
the sleep interval, call the
timeGetDevCaps function to determine
the supported minimum timer resolution
and the timeBeginPeriod function to
set the timer resolution to its
minimum. Use caution when calling
timeBeginPeriod, as frequent calls can
significantly affect the system clock,
system power usage, and the scheduler.
If you call timeBeginPeriod, call it
one time early in the application and
be sure to call the timeEndPeriod
function at the very end of the
application.
The documentation seems to imply that you can attempt to make it more accurate, but I wouldn't try that if I were you. Just use a timer.
What timers did you replace it with? If you used SetTimer(), that timer sucks too.
The correct solution is to use the higher-resolution TimerQueueTimer.

Related

Boost sleep inaccurate? [duplicate]

I have a task to do something every "round" minute(at xx:xx:00)
And I use something like
const int statisticsInterval=60;
time_t t=0;
while (1)
{
if (abs(t-time(NULL)==0)) //to avoid multiple calls in the same second that is the multiple of 60
boost::this_thread::sleep(boost::posix_time::seconds(2));//2, not 1 to make sure that 1 second passes
t=time(NULL);
boost::this_thread::sleep(boost::posix_time::seconds(statisticsInterval-(t%statisticsInterval)));
//DO WORK
}
As you can see I use sleep (60sec - number of elapsed seconds in current minute). But one programmer told me that it is not precise and that i should change it to
while loop with sleep(1) inside. I consider it highly doubtful that he is right, but I just wanted to check is somebody knows if there is less of a precision if the sleep interval is long.
I presume that sleep is implemented in a way that at certain time in the future trigger is activated and thread is put into "ready to execute thread group" so I see no reason for diff in precision. BTW OS is ubuntu and I dont care about less than 2-3 sec errors. For example if I sleep for 52 secs, 53.8 sleep is totally acceptable.
P.S. I know about sleep defining the minimal time, and that theoretically my thread might get activated in year 2047., but I'm asking about realistic scenarios.
When you do sleep(N) it tells the OS to trigger the thread at current time + N.
The reason why it isn't always accurate, is that you're not the only thread in the system.
There might be another thread that asked to be waken at that time before you, and there might just be some important OS stuff that's needed to be performed exactly at that time.
Anyway, there shouldn't be any precision issues, because the method has nothing to do with N.
The only reason that it won't be "precise" is if it's a crappy OS that can't calculate the time right. And then again, the loop won't solve that.
In some threading APIs, it's possible to be awoken before the sleep completes (eg, due to a signal arriving during the sleep). The correct way to handle this is to compute an absolute wake up time, then loop, sleeping for the remaining duration. I would imagine sleeping for one-second intervals to be a hack to approximate this, poorly.
However, the boost threading API's this_thread::sleep() is not documented to have these early wakeups, and so this technique is not necessary (the boost thread API does the loop for you).
Generally speaking, there are very few cases where using smaller sleep intervals improves wakeup latency significantly; the OS handles all wakeups more or less the same way. At best, you might keep the cache warm and avoid pageouts, but this would only affect the small portion of memory directly involved in the sleep loop.
Furthermore, most OSes deal with time using integer counters internally; this means that large intervals do not induce rounding errors (as you might find with floating point values). However, if you are using floating point for your own computation, this may be an issue. If you are currently using floating point intervals (say, a double of seconds since 1970), you may wish to consider integer units (say, a long long of milliseconds since 1970).
sleep is not very precise in many cases. It depends on the OS how precise. In Windows 7, timer resolution is about 15,4 ms I think. Also, you can usually tell the scheduler how to handle sleep slacking...
Here is a good read:
Linux: http://linux.die.net/man/3/nanosleep
Windows: http://msdn.microsoft.com/en-us/library/ms686298(v=vs.85).aspx
PS: if you want higher precision on long waits, sleep some period and use the time diff based on a real-time clock. I.e. Store the current time when you start sleeping, then at each interval check how far you are from the set wait time.
Boost.Thread implementation of sleep for POSIX systems can use different approaches to sleeping:
Timed waiting on mutex in case when thread is created with Boost.Thread and has a specific thread information.
Use pthread_delay_np, if available and thread is not created with Boost.Thread.
USe nanosleep if pthread_delay_np is not available.
Create a local mutex and do timed wait on it (worse case scenario if nothing else is available).
Cases number 2, 3 and 4 are implemented in a loop of 5 times (as of Boost 1.44). So if sleeping thread is interrupted (i.e. with some signal) more than 5 times - there can be a potential problem. But that is not likely to happen.
In all cases, precision will be much higher than a second, so doing multiple sleeps will not be more precise that doing a long one. You can only be concerned about your program being completely swapped out because of long sleep. For example, if machine is so busy, so kernel puts the whole program on disk. To avoid being swapped out, you have to spin (or do smaller sleeps and wake up occasionally). Usually, if performance matters a lot, programs do spin on a CPU and never call sleep, because any blocking call is to be avoided. But that is true if we are talking nano/micro-seconds.
In general, Sleep is not the correct method for timing of anything. Better to use a precision timer with a callback function. On Windows, one may use the "Multimedia" timers, which have a resolution no greater than 1 ms on most hardware. see here. When the timer expires, the OS calls callback function in close to real time. see here.
Sleep works in terms of scheduler time quantums (Edit: Meanwhile, the majority of operating systems supports "tickless" schedulers, i.e. there are no longer fixed quantums, however, the principle remains true... there's timer coalescing and stuff).
Unless you receive a signal there is no way you can wake up before that quantum has been used up. Also, sleep is not designed to be precise or accurate. Further, the time is more a guidline than a rule.
While you may think of the sleep time in terms of "will continue after time X", that is not at all what's going on. Technically, sleep works in terms of "mark the thread not-ready for approximately time X, then mark it ready, invoke the scheduler, and then we'll see what happens". Note the subtle difference between being "ready" and actually running. A thread can in principle be ready for a very long time, and never run.
Therefore, 60x sleep(1) can never be more accurate than sleep(60). It will make the thread not-ready and ready again 60 times, and it will invoke the scheduler 60 times. Since the scheduler cannot run in zero time (nor can a thread be made ready in zero time, nor can you do a context switch in zero time), sleeping many times for short durations necessarily needs to take longer than sleeping once for the cumulative time, in practice.
Since you state that your OS is Ubuntu, you could as well use a timerfd[1]. Set the expire time to 1 minute and read() on it. If you get EINTR, just read() again. Otherwise, you know that a minute is up. Using a timer is the correct thing to do if you want precise timing (on a physical computer, it cannot be and will never be 100.00% perfect, but it will be as good as you can get, and it will avoid other systematic errors, especially with recurring events).
The POSIX timer_create function will work as well, it's more portable, and it may be half a microsecond or so less overhead (maybe! maybe not!) but it is not nearly as comfortable and flexible as a timerfd.
You cannot get more accurate and reliable than what a timer will provide. On my not particularly impressive Ubuntu machine, timerfds work accurately to a microsecond no problemo. As a plus, it's elegant too... if you ever need to do something else while waiting, such as listen on a socket, you can plug the timerfd into the same epoll as the socket descriptor. You can share it between several processes too, and wake them simultaneously. Or, or,... many other things.
If the goal is to sleep until a given system time (xx:xx:00), consider using the overload of boost::this_thread::sleep that takes a time, as in boost::posix_time::ptime, rather than a duration.
for example,
#include <iostream>
#include <boost/date_time.hpp>
#include <boost/thread.hpp>
int main()
{
using namespace boost::posix_time;
ptime time = boost::get_system_time();
std::cout << "time is " << time << '\n';
time_duration tod = time.time_of_day();
tod = hours(tod.hours()) + minutes(tod.minutes() + 1);
time = ptime(time.date(), tod);
std::cout << "sleeping to " << time << "\n";
boost::this_thread::sleep(time);
std::cout << "now the time is " << boost::get_system_time() << '\n';
}
in C++0x these two overloads were given different names: std::this_thread::sleep_for() and std::this_thread::sleep_until();
The answer is yes. It has nothing to do with C++ however. It has everything to do with the operating system.
Because of the greater focus on low power use in current portable systems, the operating systems have been getting smarter about timers.
Both Windows and Linux use timer slack in order to avoid waking up too often. This slack is automatically calculated using the timeout duration. It can be overridden in various ways if a really accurate timer is absolutely required.
What this does for the operating system is to allow it to get into really deep sleep states. If timers are going off all of the time, the CPU and RAM don't get a chance to power down. But if timers are collected together into a batch, the CPU can power up, run all of the timer operations, then power down again.
So if there are 10 programs all sleeping for 60 seconds but offset by a half-second or so, the most efficient use of the CPU is to wake up one time, run all 10 timers and then go back to sleep instead of waking up 10 times.

Do timers have better precision compared to sleep()

A long time ago I had a bug in my program. The root cause was that the C function
sleep(60);
would on rare occasions sleep less than 60 seconds. Or the function did cause the thread to sleep more than 60 s, but the clock was changed automatically by the OS (this seems likely since bug was happening only on XX::00::00), aka it was manifesting itself rarely, and only on "round hour" (sleep shoudl have ended at >xh0m0s, it ended on x-1h59m59.99*s).
Then my project manager went on a rant how he said million times that we should only use timers, not sleep.
From that time I accepted the notion that timers are more accurate than sleep(), but now I feel that I should ask for some more authoritative source.
So :
are timers more precise than sleep?
(related) are they deep down(on the OS level) implemented using different methods?
I know timers are used to do callbacks, sleep just delays execution of current thread, Im talking about delay execution part of the implementation.
BTW OS was Linux, but I care about general answer if possible.
Timers are definitely more accurate than sleep. Sleep is meant as just a rough measure of how long until the task scheduler revives a thread or process. Changes to the system clock, an overloaded task scheduler etc. will affect how long sleep actually sleeps for.
A timer will measure time more accurately. Generally speaking a timer will measure time accurately. There are two kinds of timers - ones based on the system clock like the functions in "time.h". Those will be affected by stuff like changes to the system clock. For example - if you change the system time, switch from daylight savings time, or suspend the machine etc. the actual measured time may be different from the real time.
The other kind of clocks are high resolution timers that are based on CPU ticks. These are timers like QueryPerformanceTimer on windows, and clock_gettime() on linux. These simply count cpu cycles. They won't be affected by changes to the system timer - but they will deviate from real world time in two ways:
Time will skew over long periods because the clock resolution is not exact such that over long periods measuring time this way will cause cpu time to drift from real time.
If the machine is suspended the CPU stops and the timer will not account for this.
What you want to do is sleep for a much shorter amount of time and use the clock that has the appropriate resolution. Eg. if you need to sleep for less than a few minutes, you should use high resolution timers. Sleep 100x more often than you need to and check elapsed time every time sleep comes back to see if the right amount of time has elapsed. If you need to sleep for more than a few minutes do the same but with the functons in time.h to check elapsed time.
If you need to be 100% accurate with time you may need specialized hardware- or to check real time periodically against an online time server - like the navy's atomic clock. ( http://tycho.usno.navy.mil/ntp.html)
There is no general answer for the simple reason that there is nothing in either the C or C++ standard that provides the ability to put an application to sleep. So the discussion is inherently going to be OS-dependent.
The unix sleep() function has a coarse granularity. There's also usleep() and nanosleep() which have much finer granularity. The function select() can also be used to put an application to sleep. Simply specify a timeout and no file descriptors.
Note #1: The interaction between sleep(), usleep(), nanosleep(), itimers, and alarms is unspecified.
Note #2: Don't expect any of these mechanisms to have the precision of an atomic clock.

Are longer sleeps (in C++) less precise than short ones

I have a task to do something every "round" minute(at xx:xx:00)
And I use something like
const int statisticsInterval=60;
time_t t=0;
while (1)
{
if (abs(t-time(NULL)==0)) //to avoid multiple calls in the same second that is the multiple of 60
boost::this_thread::sleep(boost::posix_time::seconds(2));//2, not 1 to make sure that 1 second passes
t=time(NULL);
boost::this_thread::sleep(boost::posix_time::seconds(statisticsInterval-(t%statisticsInterval)));
//DO WORK
}
As you can see I use sleep (60sec - number of elapsed seconds in current minute). But one programmer told me that it is not precise and that i should change it to
while loop with sleep(1) inside. I consider it highly doubtful that he is right, but I just wanted to check is somebody knows if there is less of a precision if the sleep interval is long.
I presume that sleep is implemented in a way that at certain time in the future trigger is activated and thread is put into "ready to execute thread group" so I see no reason for diff in precision. BTW OS is ubuntu and I dont care about less than 2-3 sec errors. For example if I sleep for 52 secs, 53.8 sleep is totally acceptable.
P.S. I know about sleep defining the minimal time, and that theoretically my thread might get activated in year 2047., but I'm asking about realistic scenarios.
When you do sleep(N) it tells the OS to trigger the thread at current time + N.
The reason why it isn't always accurate, is that you're not the only thread in the system.
There might be another thread that asked to be waken at that time before you, and there might just be some important OS stuff that's needed to be performed exactly at that time.
Anyway, there shouldn't be any precision issues, because the method has nothing to do with N.
The only reason that it won't be "precise" is if it's a crappy OS that can't calculate the time right. And then again, the loop won't solve that.
In some threading APIs, it's possible to be awoken before the sleep completes (eg, due to a signal arriving during the sleep). The correct way to handle this is to compute an absolute wake up time, then loop, sleeping for the remaining duration. I would imagine sleeping for one-second intervals to be a hack to approximate this, poorly.
However, the boost threading API's this_thread::sleep() is not documented to have these early wakeups, and so this technique is not necessary (the boost thread API does the loop for you).
Generally speaking, there are very few cases where using smaller sleep intervals improves wakeup latency significantly; the OS handles all wakeups more or less the same way. At best, you might keep the cache warm and avoid pageouts, but this would only affect the small portion of memory directly involved in the sleep loop.
Furthermore, most OSes deal with time using integer counters internally; this means that large intervals do not induce rounding errors (as you might find with floating point values). However, if you are using floating point for your own computation, this may be an issue. If you are currently using floating point intervals (say, a double of seconds since 1970), you may wish to consider integer units (say, a long long of milliseconds since 1970).
sleep is not very precise in many cases. It depends on the OS how precise. In Windows 7, timer resolution is about 15,4 ms I think. Also, you can usually tell the scheduler how to handle sleep slacking...
Here is a good read:
Linux: http://linux.die.net/man/3/nanosleep
Windows: http://msdn.microsoft.com/en-us/library/ms686298(v=vs.85).aspx
PS: if you want higher precision on long waits, sleep some period and use the time diff based on a real-time clock. I.e. Store the current time when you start sleeping, then at each interval check how far you are from the set wait time.
Boost.Thread implementation of sleep for POSIX systems can use different approaches to sleeping:
Timed waiting on mutex in case when thread is created with Boost.Thread and has a specific thread information.
Use pthread_delay_np, if available and thread is not created with Boost.Thread.
USe nanosleep if pthread_delay_np is not available.
Create a local mutex and do timed wait on it (worse case scenario if nothing else is available).
Cases number 2, 3 and 4 are implemented in a loop of 5 times (as of Boost 1.44). So if sleeping thread is interrupted (i.e. with some signal) more than 5 times - there can be a potential problem. But that is not likely to happen.
In all cases, precision will be much higher than a second, so doing multiple sleeps will not be more precise that doing a long one. You can only be concerned about your program being completely swapped out because of long sleep. For example, if machine is so busy, so kernel puts the whole program on disk. To avoid being swapped out, you have to spin (or do smaller sleeps and wake up occasionally). Usually, if performance matters a lot, programs do spin on a CPU and never call sleep, because any blocking call is to be avoided. But that is true if we are talking nano/micro-seconds.
In general, Sleep is not the correct method for timing of anything. Better to use a precision timer with a callback function. On Windows, one may use the "Multimedia" timers, which have a resolution no greater than 1 ms on most hardware. see here. When the timer expires, the OS calls callback function in close to real time. see here.
Sleep works in terms of scheduler time quantums (Edit: Meanwhile, the majority of operating systems supports "tickless" schedulers, i.e. there are no longer fixed quantums, however, the principle remains true... there's timer coalescing and stuff).
Unless you receive a signal there is no way you can wake up before that quantum has been used up. Also, sleep is not designed to be precise or accurate. Further, the time is more a guidline than a rule.
While you may think of the sleep time in terms of "will continue after time X", that is not at all what's going on. Technically, sleep works in terms of "mark the thread not-ready for approximately time X, then mark it ready, invoke the scheduler, and then we'll see what happens". Note the subtle difference between being "ready" and actually running. A thread can in principle be ready for a very long time, and never run.
Therefore, 60x sleep(1) can never be more accurate than sleep(60). It will make the thread not-ready and ready again 60 times, and it will invoke the scheduler 60 times. Since the scheduler cannot run in zero time (nor can a thread be made ready in zero time, nor can you do a context switch in zero time), sleeping many times for short durations necessarily needs to take longer than sleeping once for the cumulative time, in practice.
Since you state that your OS is Ubuntu, you could as well use a timerfd[1]. Set the expire time to 1 minute and read() on it. If you get EINTR, just read() again. Otherwise, you know that a minute is up. Using a timer is the correct thing to do if you want precise timing (on a physical computer, it cannot be and will never be 100.00% perfect, but it will be as good as you can get, and it will avoid other systematic errors, especially with recurring events).
The POSIX timer_create function will work as well, it's more portable, and it may be half a microsecond or so less overhead (maybe! maybe not!) but it is not nearly as comfortable and flexible as a timerfd.
You cannot get more accurate and reliable than what a timer will provide. On my not particularly impressive Ubuntu machine, timerfds work accurately to a microsecond no problemo. As a plus, it's elegant too... if you ever need to do something else while waiting, such as listen on a socket, you can plug the timerfd into the same epoll as the socket descriptor. You can share it between several processes too, and wake them simultaneously. Or, or,... many other things.
If the goal is to sleep until a given system time (xx:xx:00), consider using the overload of boost::this_thread::sleep that takes a time, as in boost::posix_time::ptime, rather than a duration.
for example,
#include <iostream>
#include <boost/date_time.hpp>
#include <boost/thread.hpp>
int main()
{
using namespace boost::posix_time;
ptime time = boost::get_system_time();
std::cout << "time is " << time << '\n';
time_duration tod = time.time_of_day();
tod = hours(tod.hours()) + minutes(tod.minutes() + 1);
time = ptime(time.date(), tod);
std::cout << "sleeping to " << time << "\n";
boost::this_thread::sleep(time);
std::cout << "now the time is " << boost::get_system_time() << '\n';
}
in C++0x these two overloads were given different names: std::this_thread::sleep_for() and std::this_thread::sleep_until();
The answer is yes. It has nothing to do with C++ however. It has everything to do with the operating system.
Because of the greater focus on low power use in current portable systems, the operating systems have been getting smarter about timers.
Both Windows and Linux use timer slack in order to avoid waking up too often. This slack is automatically calculated using the timeout duration. It can be overridden in various ways if a really accurate timer is absolutely required.
What this does for the operating system is to allow it to get into really deep sleep states. If timers are going off all of the time, the CPU and RAM don't get a chance to power down. But if timers are collected together into a batch, the CPU can power up, run all of the timer operations, then power down again.
So if there are 10 programs all sleeping for 60 seconds but offset by a half-second or so, the most efficient use of the CPU is to wake up one time, run all 10 timers and then go back to sleep instead of waking up 10 times.

Linux, need accurate program timing. Scheduler wake up program

I have a thread running on a Linux system which i need to execute in as accurate intervals as possbile. E.g. execute once every ms.
Currently this is done by creating a timer with
timerfd_create(CLOCK_MONOTONIC, 0)
, and then passing the desired sleep time in a struct with
timerfd_settime (fd, 0, &itval, NULL);
A blocking read call is performed on this timer which halts thread execution and reports lost wakeup calls.
The problem is that at higher frequencies, the system starts loosing deadlines, even though CPU usage is below 10%. I think this is due to the scheduler not waking the thread often enough to check the blocking call. Is there a command i can use to tell the scheduler to wake the thread at certain intervals as far as it is possble?
Busy-waiting is a bad option since the system handles many other tasks.
Thank you.
You need to get RT linux*, and then increase the RT priority of the process that you want to wake up at regular intervals.
Other then that, I do not see problems in your code, and if your process is not getting blocked, it should work fine.
(*) RT linux - an os with some real time scheduling patches applied.
One way to reduce scheduler latency is to run your process using the realtime scheduler such as SCHED_FIFO. See sched_setscheduler .
This will generally improve latency a lot, but still theres little guarantee, to further reduce latency spikes, you'll need to move to the realtime brance of linux, or a realtime OS such as VxWorks, RTEMS or QNX.
You won't be able to do what you want unless you run it on an actual "Real Time OS".
If this is only Linux for x86 system I would choose HPET timer. I think all modern PCs has this hardware timer build in and it is very, very accurate. I allow you to define callback that will be called every millisecond and in this callback you can do your calculations (if they are simple) or just trigger other thread work using some synchronization object (conditional variable for example)
Here is some example how to use this timer http://blog.fpmurphy.com/2009/07/linux-hpet-support.html
Along with other advice such as setting the scheduling class to SCHED_FIFO, you will need to use a Linux kernel compiled with a high enough tick rate that it can meet your deadline.
For example, a kernel compiled with CONFIG_HZ of 100 or 250 Hz (timer interrupts per second) can never respond to timer events faster than that.
You must also set your timer to be just a little bit faster than you actually need, because timers are allowed to go beyond their requested time but never expire early, this will give you better results. If you need 1 ms, then I'd recommend asking for 999 us instead.

Could someone explain this interesting behaviour with Sleep(1)?

I was testing how long a various win32 API calls will wait for when asked to wait for 1ms. I tried:
::Sleep(1)
::WaitForSingleObject(handle, 1)
::GetQueuedCompletionStatus(handle, &bytes, &key, &overlapped, 1)
I was detecting the elapsed time using QueryPerformanceCounter and QueryPerformanceFrequency. The elapsed time was about 15ms most of the time, which is expected and documented all over the Internet. However for short period of time the waits were taking about 2ms!!! It happen consistently for few minutes but now it is back to 15ms. I did not use timeBeginPeriod() and timeEndPeriod calls! Then I tried the same app on another machine and waits are constantly taking about 2ms! Both machines have Windows XP SP2 and hardware should be identical. Is there something that explains why wait times vary by so much? TIA
Thread.Sleep(0) will let any threads of the same priority execute. Thread.Sleep(1) will let any threads of the same or lower priority execute.
Each thread is given an interval of time to execute in, before the scheduler lets another thread execute. As Billy ONeal states, calling Thread.Sleep will give up the rest of this interval to other threads (subject to the priority considerations above).
Windows balances over threads over the entire OS - not just in your process. This means that other threads on the OS can also cause your thread to be pre-empted (ie interrupted and the rest of the time interval given to another thread).
There is an article that might be of interest on the topic of Thread.Sleep(x) at:
Priority-induced starvation: Why Sleep(1) is better than Sleep(0) and the Windows balance set manager
Changing the timer's resolution can be done by any process on the system, and the effect is seen globally. See this article on how the Hotspot Java compiler deals with times on windows, specifically:
Note that any application can change the timer interrupt and that it affects the whole system. Windows only allows the period to be shortened, thus ensuring that the shortest requested period by all applications is the one that is used. If a process doesn't reset the period then Windows takes care of it when the process terminates. The reason why the VM doesn't just arbitrarily change the interrupt rate when it starts - it could do this - is that there is a potential performance impact to everything on the system due to the 10x increase in interrupts. However other applications do change it, typically multi-media viewers/players.
The biggest thing sleep(1) does is give up the rest of your thread's quantum . That depends entirely upon how much of your thread's quantum remains when you call sleep.
To aggregate what was said before:
CPU time is assigned in quantums (time slices)
The thread scheduler picks the thread to run. This thread may run for the entire time slice, even if threads of higher priority become ready to run.
Typical time slices are 8..15ms, depending on architecture.
The thread can "give up" the time slice - typically Sleep(0) or Sleep(1). Sleep(0) allows another thread of same or hogher priority to run for the next time slice. Sleep(1) allows "any" thread.
The time slice is global and can be affected by all processes
Even if you don't change the time slice, someone else could.
Even if the time slice doesn't change, you may "jump" between the two different times.
For simplicity, assume a single core, your thread and another thread X.
If Thread X runs at the same priority as yours, crunching numbers, Your Sleep(1) will take an entire time slice, 15ms being typical on client systems.
If Thread X runs at a lower priority, and gives up its own time slice after 4 ms, your Sleep(1) will take 4 ms.
I would say it just depends on how loaded the cpu is, if there arent many other process/threads it could get back to the calling thread a lot faster.