Sleeping for an exact duration - c++

My understanding of the Sleep function is that it follows "at least semantics" i.e. sleep(5) will guarantee that the thread sleeps for 5 seconds, but it may remain blocked for more than 5 seconds depending on other factors. Is there a way to sleep for exactly a specified time period (without busy waiting).

As others have said, you really need to use a real-time OS to try and achieve this. Precise software timing is quite tricky.
However... although not perfect, you can get a LOT better results than "normal" by simply boosting the priority of the process that needs better timing. In Windows you can achieve this with the SetPriorityClass function. If you set the priority to the highest level (REALTIME_PRIORITY_CLASS: 0x00000100) you'll get much better timing results. Again - this will not be perfect like you are asking for, though.
This is also likely possible on other platforms than Windows, but I've never had reason to do it so haven't tested it.
EDIT: As per the comment by Andy T, if your app is multi-threaded you also need to watch out for the priority assigned to the threads. For Windows this is documented here.
Some background...
A while back I used SetPriorityClass to boost the priority on an application where I was doing real-time analysis of high-speed video and I could NOT miss a frame. Frames were arriving to the pc at a very regular (driven by external framegrabber HW) frequency of 300 frames per second (fps), which fired a HW interrupt on every frame which I then serviced. Since timing was very important, I collected a lot of stats on the interrupt timing (using QueryPerformanceCounter stuff) to see how bad the situation really was, and was appalled at the resulting distributions. I don't have the stats handy, but basically Windows was servicing the interrupt whenever it felt like it when run at normal priority. The histograms were very messy, with the stdev being wider than my ~3ms period. Frequently I would have gigantic gaps of 200 ms or greater in the interrupt servicing (recall that the interrupt fired roughly every 3 ms)!! ie: HW interrupts are FAR from exact! You're stuck with what the OS decides to do for you.
However - when I discovered the REALTIME_PRIORITY_CLASS setting and benchmarked with that priority, it was significantly better and the service interval distribution was extremely tight. I could run 10 minutes of 300 fps and not miss a single frame. Measured interrupt servicing periods were pretty much exactly 1/300 s with a tight distribution.
Also - try and minimize the other things the OS is doing to help improve the odds of your timing working better in the app where it matters. eg: no background video transcoding or disk de-fragging or anything while your trying to get precision timing with other code!!
In summary:
If you really need this, go with a real time OS
If you can't use a real-time OS (impossible or impractical), boosting your process priority will likely improve your timing by a lot, as it did for me
HW interrupts won't do it... the OS still needs to decide to service them!
Make sure that you don't have a lot of other processes running that are competing for OS attention
If timing is really important to you, do some testing. Although getting code to run exactly when you want it to is not very easy, measuring this deviation is quite easy. The high performance counters in PCs (what you get with QueryPerformanceCounter) are extremely good.
Since it may be helpful (although a bit off topic), here's a small class I wrote a long time ago for using the high performance counters on a Windows machine. It may be useful for your testing:
CHiResTimer.h
#pragma once
#include "stdafx.h"
#include <windows.h>
class CHiResTimer
{
private:
LARGE_INTEGER frequency;
LARGE_INTEGER startCounts;
double ConvertCountsToSeconds(LONGLONG Counts);
public:
CHiResTimer(); // constructor
void ResetTimer(void);
double GetElapsedTime_s(void);
};
CHiResTimer.cpp
#include "stdafx.h"
#include "CHiResTimer.h"
double CHiResTimer::ConvertCountsToSeconds(LONGLONG Counts)
{
return ((double)Counts / (double)frequency.QuadPart) ;
}
CHiResTimer::CHiResTimer()
{
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&startCounts); // starts the timer right away
}
void CHiResTimer::ResetTimer()
{
QueryPerformanceCounter(&startCounts); // reset the reference counter
}
double CHiResTimer::GetElapsedTime_s()
{
LARGE_INTEGER countsNow;
QueryPerformanceCounter(&countsNow);
return ConvertCountsToSeconds(countsNow.QuadPart - startCounts.QuadPart);
}

No.
The reason it's "at least semantics" is because that after those 5 seconds some other thread may be busy.
Every thread gets a time slice from the Operating System. The Operating System controls the order in which the threads are run.
When you put a thread to sleep, the OS puts the thread in a waiting list, and when the timer is over the operating system "Wakes" the thread.
This means that the thread is added back to the active threads list, but it isn't guaranteed that t will be added in first place. (What if 100 threads need to be awaken in that specific second ? Who will go first ?)

While standard Linux is not a realtime operating system, the kernel developers pay close attention to how long a high priority process would remain starved while kernel locks are held. Thus, a stock Linux kernel is usually good enough for many soft-realtime applications.
You can schedule your process as a realtime task with the sched_setscheduler(2) call, using either SCHED_FIFO or SCHED_RR. The two have slight differences in semantics, but it may be enough to know that a SCHED_RR task will eventually relinquish the processor to another task of the same priority due to time slices, while a SCHED_FIFO task will only relinquish the CPU to another task of the same priority due to blocking I/O or an explicit call to sched_yield(2).
Be careful when using realtime scheduled tasks; as they always take priority over standard tasks, you can easily find yourself coding an infinite loop that never relinquishes the CPU and blocks admins from using ssh to kill the process. So it might not hurt to run an sshd at a higher realtime priority, at least until you're sure you've fixed the worst bugs.
There are variants of Linux available that have been worked on to provide hard-realtime guarantees. RTLinux has commercial support; Xenomai and RTAI are competing implementations of realtime extensions for Linux, but I know nothing else about them.

As previous answerers said: there is no way to be exact (some suggested realtime-os or hardware interrupts and even those are not exact). I think what you are looking for is something that is just more precise than the sleep() function and you find that depending on your OS in e.g. the Windows Sleep() function or under GNU the nanosleep() function.
http://msdn.microsoft.com/en-us/library/ms686298%28VS.85%29.aspx
http://www.delorie.com/gnu/docs/glibc/libc_445.html
Both will give you precision within a few milliseconds.

Well, you try to tackle a difficult problem, and achieving exact timing is not feasible: the best you can do is to use hardware interrupts, and the implementation will depend on both your underlying hardware, and your operating system (namely, you will need a real-time operating system, which most regular desktop OS are not). What is your exact target platform?

No. Because you're always depending on the OS to handle waking up threads at the right time.

There is no way to sleep for a specified time period using standard C. You will need, at minimum, a 3rd party library which provides greater granularity, and you might also need a special operating system kernel such as the real-time Linux kernels.
For instance, here is a discussion of how close you can come on Win32 systems.
This is not a C question.

Related

Analyzing spikes in performance measurement

I have a set of C++ functions which does some image processing related operation. Generally I see that the final output is delivered in 5-6ms time range. I am measuring the time taken using QueryPerformanceCounter Win32 API. But when running in a continuous loop with 100 images, I see that the performance spikes up to 20ms for some images. My question is how do I go about analyzing such issues. Basically I want to determine whether the spikes are caused due to some delay in this code or whether some other task started running inside the CPU because of which this operation took time. I have tried using GetThreadTimes API to see how much time my thread spent inside CPU but am unable to conclude based on those numbers. What is the standard way to go about troubleshooting these types of issues?
Reason behind sudden spikes during processing could be any of IO, interrupt, scheduled processes etc.
It is very common to see such spikes considering such low latency/processing time operations. IMO you can consider them because of any of the above mentioned reasons (There could be more). Simplest solution is run same experiment with more inputs multiple times and take the average for final consideration.
To answer your question about checking/confirming source of the spike you can try following,
Check variation in images - already ruled out as per your comment
Monitor resource utilization during processing. Check if any resource is choking (% util is simplest way to check and SAR/NMON utility on linux is best with minimal overhead)
Reserve few CPU's on system (CPU Affinity) for your experiment which are dedicated only for your program and no OS task will run on them. Taskset is simplest utility to try out. More details are here.
Run the experiment with this setting and check behavior.
That's a nasty thing you are trying to figure out, I wouldn'd even attempt to, since coming into concrete conlusions is hard.
In general, one should run a loop of many iterations (100 just seems too small I think), and then take the average time for an image to be processed.
That will rule out any unexpected exterior events that may have hurt performance of your program.
A typical way to check if "some other task started running inside the CPU" would be to run your program once and mark the images that produce that spike. Example, image 2, 4, 5, and 67 take too long to be processed. Run your program again some times, and mark again which images produce the spikes.
If the same images produce these spikes, then it's not something caused by another exterior task.
What is the standard way to go about troubleshooting these types of issues?
There are Real Time Operating Systems (RTOS) which guarantee those kind of delays. It is totally different class of operating systems than Windows or Linux.
But still, there are something you can do about your delays even on general purpose OS.
1. Avoid system calls
Once you ask your OS to read or write something to a disk -- there are no guarantees whatever about delays. So, avoid any system functions on you critical path:
even functions like gettimeofday() might cause unpredictable delays, so you should really avoid any system calls in time-critical code;
use another thread to perform IO and pass data via a shared buffer to your critical code.
If your code base is big, tools like strace on Linux or Dr Memory on Windows to trace system calls.
2. Avoid context switches
The multi threading on Windows is preemptive. It means, there is a system scheduler, which might stop your thread any time and schedule another thread on your CPU. As previously, there are RTOSes, which allow to avoid such context switches, but there is something you can do about it:
make sure there is at least one CPU core left for system and other tasks;
bind each of your threads to a dedicated CPU with SetThreadAffinityMask() (Windows) or sched_setaffinity() (Linux) -- this effectively hints system scheduler to avoid scheduling other threads on this CPU;
make sure hardware interrupts go to another CPU; usually interrupts go to CPU 0, so the easiest way would be to bind your thread with CPU 1+;
increase your thread priority, so scheduler less likely to switch your thread with another one.
There are tools like perf (Linux) and Intel VTune (Windows) to confirm there are context switches.
3. Avoid other non-deterministic features
Few more sources of unexpected delays:
disable swap, so you know for sure your thread memory will not be swapped on slow and unpredictable disk drive;
disable CPU turbo boost -- after a high-performance CPU boosts, there is always a slow down, so the CPU stays withing its thermal power (TDP);
disable hyper threading -- from scheduler point of view those are independent CPUs, but in fact performance of each hyper-thread CPU depend on what another thread is doing at the moment.
Hope this helps.

How to guarantee exact thread sleep interval?

Usually if I want to simulate some work or wait exact time interval I use condition_variable::wait_for or at the worst thread::this_thread::sleep_for. But condition_variable documentation states that wait_for or wait_until methods may block longer than was requested.
This function may block for longer than timeout_duration due to scheduling or resource contention delays.
How exact wait intervals can be guaranteed?
UPDATE
Can I reach it without condition_variable?
You cannot do this.
In order to have exact guarantees like this, you need a real time operating system.
C++ does not guarantee you are on a real time operating system.
So it provides the guarantees that a typical, non-RTOS provides.
Note that there are other complications to programming on a RTOS that go far beyond the scope of this question.
In practice, one thing people when they really want fine-grained timing control (say, they are twiddling around with per-frame or per-scanline buffers or the like, or audio buffers, or whatever) do is check if the time is short, and if so spin-wait. If the time is longer, they wait for a bit less than the amount of time they want to wait for, then wake up and spin.
This is also not guaranteed to work, but works well enough for almost all cases.
On a RTOS, the platform may provide primitives like you want. These lie outside the scope of standard C++. No typical desktop OS is an RTOS that I am aware of. If you are programming for a fighter jet's control hardware or similar, you may be programming on an RTOS.
I hope you aren't writing fighter jet control software and asking this question on stack overflow.
If you did hypothetically sleep for precisely some exact duration, and then performed some action in response (such as getting the current time, or printing a message to the screen), then that action might be delayed for some unknown period of time e.g. due to processor load. This is equivalent to the action happening (almost) immediately but the timer taking longer than expected. Even in the best case scenario, where the timer completes at precisely the time you request, and the operating system allows your action to complete without preempting your process, it will take a few clock cycles to perform that action.
So in other words, on a standard operating system, it is impossible or maybe even meaningless for a timer to complete at precisely the time requested.
How can this be overcome? An academic answer is that you can used specialized software and hardware such as a real-time operating system, but this is vastly more complicated to develop software for than regular programming. What you probably really want to know is, in the common case, the delay that that documentation refers to is not substantial i.e. it is commonly less that 1/100th second.
With a brute force loop... for example:
chrono::microseconds sleep_duration{1000};
auto now = chrono::high_resolution_clock::now()
while(true)
{
auto elapsed = chrono::duration_cast<hrono::microseconds>(chrono::high_resolution_clock::now() - now);
if (elapsed > sleep_duration)
break;
}
That's bit ugly but desktop operating system are not real time so you cannot have such precision.
In order to relax the cpu you look the following snippet:
void little_sleep(std::chrono::microseconds us)
{
auto start = std::chrono::high_resolution_clock::now();
auto end = start + us;
do {
std::this_thread::yield();
} while (std::chrono::high_resolution_clock::now() < end);
}
That depends on what accuracy you can expect. Generally as others have said regular OS (Linux, Windows) cannot guaranty that.
Why?
Your OS have probably has concept of threads. If so, then there is a scheduler which interrupts threads and switch execution to other threads waiting in the queue. And this can spoil accuracy of timers.
What can I do about it?
If you are using embedded system - go for bare metal, i.e. don't use
OS or use so called hard real time operating system.
If you are using Linux, look for Linux RT Preempt Patch in Google. You have to recompile your kernel to include the path (not so complicated though) and then you can create threads with priority above 50 - which means priority above kernel's thread - which in the end means that you can have a thread that can interrupt scheduler and kernel in general, providing quite good time accuracy. In my case it what three orders of magnitude (from few ms of latency to few us).
If you are using Windows, I don't know about such patch, but you can search for for High Precisions timers on Microsoft site. Maybe provided accuracy will be enough for your needs.

Multitasking and measuring time difference

I understand that a preemptive multitasking OS can interrupt a process at any "code position".
Given the following code:
int main() {
while( true ) {
doSthImportant(); // needs to be executed at least each 20 msec
// start of critical section
int start_usec = getTime_usec();
doSthElse();
int timeDiff_usec = getTime_usec() - start_usec;
// end of critical section
evalUsedTime( timeDiff_usec );
sleep_msec( 10 );
}
}
I would expect this code to usually produce proper results for timeDiff_usec, especially in case that doSthElse() and getTime_usec() don't take much time so they get interrupted rarely by the OS scheduler.
But the program would get interrupted from time to time somewhere in the "critical section". The context switch will do what it is supposed to do, and still in such a case the program would produce wrong results for the timeDiff_usec.
This is the only example I have in mind right now but I'm sure there would be other scenarios where multitasking might get a program(mer) into trouble (as time is not the only state that might be changed at re-entry).
Is there a way to ensure that measuring the time for a certain action works fine?
Which other common issues are critical with multitasking and need to be considered? (I'm not thinking of thread safety - but there might be common issues).
Edit:
I changed the sample code to make it more precise.
I want to check the time being spent to make sure that doSthElse() doesn't take like 50 msec or so, and if it does I would look for a better solution.
Is there a way to ensure that measuring the time for a certain action works fine?
That depends on your operating system and your privilege level. On some systems, for some privilege levels, you can set a process or thread to have a priority that prevents it from being preempted by anything at lower priority. For example, on Linux, you might use sched_setscheduler to give a thread real-time priority. (If you're really serious, you can also set the thread affinity and SMP affinities to prevent any interrupts from being handled on the CPU that's running your thread.)
Your system may also provide time tracking that accounts for time spent preempted. For example, POSIX defines the getrusage function, which returns a struct containing ru_utime (the amount of time spent in “user mode” by the process) and ru_stime (the amount of time spent in “kernel mode” by the process). These should sum to the total time the CPU spent on the process, excluding intervals during which the process was suspended. Note that if the kernel needs to, for example, spend time paging on behalf of your process, it's not defined how much (if any) of that time is charged to your process.
Anyway, the common way to measure time spent on some critical action is to time it (essentially the way your question presents) repeatedly, on an otherwise idle system, throw out outlier measurements, and take the mean (after eliminating outliers), or take the median or 95th percentile of the measurements, depending on why you need the measurement.
Which other common issues are critical with multitasking and need to be considered? (I'm not thinking of thread safety - but there might be common issues).
Too broad. There are whole books written about this subject.

How to do something every millisecond or better on Windows

This question is not about timing something accurately on Windows (XP or better), but rather about doing something very rapidly via callback or interrupt.
I need to be doing something regularly every 1 millisecond, or preferably even every 100 microseconds. What I need to do is drive some assynchronous hardware (ethernet) at this rate to output a steady stream of packets to the network, and make that stream appear to be as regular and synchronous as possible. But if the question can be separated from the (ethernet) device, it would be good to know the general answer.
Before you say "don't even think about using Windows!!!!", a little context. Not all real-time systems have the same demands. Most of the time songs and video play acceptably on Windows despite needing to handle blocks of audio or images every 10-16ms or so on average. With appropriate buffering, Windows can have its variable latencies, but the hardware can be broadly immune to them, and keep a steady synchronous stream of events happening. Even so, most of us tolerate the occasional glitch. My application is like that - probably quite tolerant.
The expensive option for me is to port my entire application to Linux. But Linux is simply different software running on the same hardware, so my strong preference is to write some better software, and stick with Windows. I have the luxury of being able to eliminate all competing hardware and software (no internet or other network access, no other applications running, etc). Do I have any prospect of getting Windows to do this? What limitations will I run into?
I am aware that my target hardware has a High Performance Event Timer, and that this timer can be programmed to interrupt, but that there is no driver for it. Can I write one? Are there useful examples out there? I have not found one yet. Would this interfere with QueryPerformanceCounter? Does the fact that I'm going to be using an ethernet device mean that it all becomes simple if I use select() judiciously?
Pointers to useful articles welcomed - I have found dozens of overviews on how to get accurate times, but none yet on how to do something like this other than by using what amounts to a busy wait. Is there a way to avoid a busy wait? Is there a kernel mode or device driver option?
You should consider looking at the Multimedia Timers. These are timers that are intended to the sort of resolution you are looking at.
Have a look here on MSDN.
I did this using DirectX 9, using the QueryPerformanceCounter, but you will need to hog at least one core, as task switching will mess you up.
For a good comparison on tiemers you can look at
http://www.geisswerks.com/ryan/FAQS/timing.html
If you run into timer granularity issues, I would suggest using good old Sleep() with a spin loop. Essentially, the code should do something like:
void PrecisionSleep(uint64 microSec)
{
uint64 start_time;
start_time = GetCurrentTime(); // assuming GetCurrentTime() returns microsecs
// Calculate number of 10ms intervals using standard OS sleep.
Sleep(10*(microSec/10000)); // assuming Sleep() takes millisecs as argument
// Spin loop to spend the rest of the time in
while(GetCurrentTime() - start_time < microSec)
{}
}
This way, you will have a high precision sleep which wouldn't tax your CPU much if a lot of them are larger than the scheduling granularity (assumed 10ms). You can send your packets in a loop while you use the high precision sleep to time them.
The reason audio works fine on most systems is that the audio device has its own clock. You just buffer the audio data to it and it takes care of playing it and interrupts the program when the buffer is empty. In fact, a time skew between the audio card clock and the CPU clock can cause problems if a playback engine relies on the CPU clock.
EDIT:
You can make a timer abstraction out of this by using a thread which uses a lock protected min heap of timed entries (the heap comparison is done on the expiry timestamp) and then you can either callback() or SetEvent() when the PrecisionSleep() to the next timestamp completes.
Use NtSetTimerResolution when program starts up to set timer resolution. Yes, it is undocumented function, but works well. You may also use NtQueryTimerResolution to know timer-resolution (before setting and after setting new resolution to be sure).
You need to dynamically get the address of these functions using GetProcAddress from NTDLL.DLL, as it is not declared in header or any LIB file.
Setting timer resolution this way would affect Sleep, Windows timers, functions that return current time etc.

Sleep thread 100.8564 millisecond in c++ under window plateform

I there any method to sleep the thread upto 100.8564 millisecond under window OS. I am using multimedia timer but its resolution is minimum 1 second. Kindly guide me so that I can handle the fractional part of the millisecond.
Yes you can do it. See QueryPerformanceCounter() to read accurate time, and make a busy loop.
This will enable you to make waits with up to 10 nanosecond resolution, however, if thread scheduler decides to steal control from you at the moment of the cycle end, it will, and there's nothing you can do about it except assigning your process realtime priority.
You may also have a look at this: http://msdn.microsoft.com/en-us/library/ms838340(WinEmbedded.5).aspx
Several frameworks were developed to do hard realtime on windows.
Otherwise, your question probably implies that you might be doing something wrong. There're numerous mechanisms to trick around ever needing precise delays, such as using proper bus drivers (in case of hardware/IO, or respective DMAs if you are designing a driver), and more.
Please tell us what exactly are you building.
I do not know your use case, but even a high end realtime operating system would be hard pressed to achieve less 100ns jitter on timings.
In most cases I found you do not need that precision in reproducibility but only for long time drift. In that respect it is relatively straightforward to keep a timeline and calculate the event on the desired precision. Then use that timeline to synchronize the events which may be off even by 10's of ms. As long as these errors do not add up, I found I got adequate performance.
If you need guaranteed latency, you cannot get it with MS Windows. It's not a realtime operating system. It might swap in another thread or process at an importune instant. You might get a cache miss. When I did a robot controller a while back, I used an OS called On Time RTOS 32. It has an MS Windows API emulation layer. You can use it with Visual Studio. You'll need something like that.
The resolution of a multimedia timer is much better than one second. It can go down to 1 millisecond when you call timeBeginPeriod(1) first. The timer will automatically adjust its interval for the next call when the callback is delivered late. Which is inevitable on a multi-tasking operating system, there is always some kind of kernel thread with a higher priority than yours that will delay the callback.
While it will work pretty well on average, worst case latency is in the order of hundreds of milliseconds. Clearly, your requirements cannot be met by Windows by a long shot. You'll need some kind of microcontroller to supply that kind of execution guarantee.