Linux, need accurate program timing. Scheduler wake up program

Linux, need accurate program timing. Scheduler wake up program - c++

I have a thread running on a Linux system which i need to execute in as accurate intervals as possbile. E.g. execute once every ms.
Currently this is done by creating a timer with
timerfd_create(CLOCK_MONOTONIC, 0)
, and then passing the desired sleep time in a struct with
timerfd_settime (fd, 0, &itval, NULL);
A blocking read call is performed on this timer which halts thread execution and reports lost wakeup calls.
The problem is that at higher frequencies, the system starts loosing deadlines, even though CPU usage is below 10%. I think this is due to the scheduler not waking the thread often enough to check the blocking call. Is there a command i can use to tell the scheduler to wake the thread at certain intervals as far as it is possble?
Busy-waiting is a bad option since the system handles many other tasks.
Thank you.

You need to get RT linux*, and then increase the RT priority of the process that you want to wake up at regular intervals.
Other then that, I do not see problems in your code, and if your process is not getting blocked, it should work fine.
(*) RT linux - an os with some real time scheduling patches applied.

One way to reduce scheduler latency is to run your process using the realtime scheduler such as SCHED_FIFO. See sched_setscheduler .
This will generally improve latency a lot, but still theres little guarantee, to further reduce latency spikes, you'll need to move to the realtime brance of linux, or a realtime OS such as VxWorks, RTEMS or QNX.

You won't be able to do what you want unless you run it on an actual "Real Time OS".

If this is only Linux for x86 system I would choose HPET timer. I think all modern PCs has this hardware timer build in and it is very, very accurate. I allow you to define callback that will be called every millisecond and in this callback you can do your calculations (if they are simple) or just trigger other thread work using some synchronization object (conditional variable for example)
Here is some example how to use this timer http://blog.fpmurphy.com/2009/07/linux-hpet-support.html

Along with other advice such as setting the scheduling class to SCHED_FIFO, you will need to use a Linux kernel compiled with a high enough tick rate that it can meet your deadline.
For example, a kernel compiled with CONFIG_HZ of 100 or 250 Hz (timer interrupts per second) can never respond to timer events faster than that.
You must also set your timer to be just a little bit faster than you actually need, because timers are allowed to go beyond their requested time but never expire early, this will give you better results. If you need 1 ms, then I'd recommend asking for 999 us instead.

Related

Can I guarantee that Sleep() would not sleep for more than 10 ms?

I know that Sleep() is not accurate, but is there's a way to make it not sleep for more than 10 ms (i.e. only sleep between 1 ms and 10 ms)? Or does Sleep(1) already guarantee that?

If you really want guaranteed timings, you will not be using Windows at all.
To answer your question, Sleep() does not provide any means of guaranteeing an upper bound on the sleep time.
In windows, this is because Sleep() relinquishes the threads's time slice, and it is not guaranteed that the system scheduler will schedule the sleeping thread (i.e. allocate another time slice) to execute immediately after the sleep time is up. That depends on priorities of competing threads, scheduling policies, and things like that.
In practice, the actual sleep interval depends a lot on what other programs are running on the system, configuration of the system, whether other programs are accessing slow drives, etc etc.
With a lightly loaded system, it is a fair bet Sleep(1) will sleep between 1 and 2 ms on any modern (GHz frequency CPU or better). However, it is not impossible for your program to experience greater delays.
With a heavily loaded system (lots of other programs executing, using CPU and timer resources), it is a fair bet your program will experience substantially greater delays than 1ms, and even more than 10ms.
In short: no guarantees.

There is no way to guarantee it.
This is what real time OS are for.
In general case if your OS doesn't experience high loads sleep will be pretty accurate but as you increase load on it the more inaccurate it will get.

No. Or, yes, depending on your perspective.
According to the documentation:
After the sleep interval has passed, the thread is ready to run. If
you specify 0 milliseconds, the thread will relinquish the remainder
of its time slice but remain ready. Note that a ready thread is not
guaranteed to run immediately. Consequently, the thread may not run
until some time after the sleep interval elapses. For more
information, see Scheduling Priorities.
What this means is that the problem isn't Sleep. Rather, when Sleep ends, your thread may still need to wait to become active again.

You cannot count on 10 milliseconds, that's too low. Sleep() accuracy is affected by:
The clock tick interrupt frequency. In general, the processor tends to be in a quiescent state, not consuming any power and turned off by the HLT instruction. It is dead to the world, unaware that time is passing and unaware that your sleep interval has expired. A periodic hardware interrupt generated by the chipset wakes it up and makes it pay attention again. By default, this interrupt is generated 64 times per second. Or once every 15.625 milliseconds.
The thread scheduler runs at every clock interrupt. It is the one that notices that your sleep interval has expired, it will put the thread back into the ready-to-run state. And boosts its priority so that it is more likely to acquire a processor core. It will do so when no other threads with higher priority are ready to run.
There isn't much you can do about the 2nd bullet, you have to compete with everybody else and take your fair share. If the thread does a lot of sleeping and little computation then it is not unreasonable to claim more than your fair share, call SetThreadPriority() to boost your base priority and make it more likely that your sleep interval is accurate. If that isn't good enough then the only way to claim a high enough priority that will always beat everybody else is by writing ring 0 code, a driver.
You can mess with the 1st bullet, it is pretty common to do so. Also the reason why many programmers think that the default accuracy is 10 msec. Or if they use Chrome that it might be 1 msec, that browser jacks up the interrupt rate sky-high. A fairly unreasonable thing to do, bad for power consumption, unless you are in the business of making your mobile operating system products look good :)
Call timeBeginPeriod before you need to make your sleep intervals short enough, timeEndPeriod() when you're done. Use NtSetTimerResolution() if you need to go lower than 1 msec.

Sleep won't guarantee that.
The only way I know of doing that is to have a thread wait for a fast timer event and free a synchronization object every 10 ms or so.
You will pass a semaphore to this "wait server task", and it will free it on the next timer tick, thus giving you a response time between 0 and 10 ms.
Of couse if you want an extreme precision you will have to boost this thread priority above other tasks that might preempt it, and at any rate you might still be preempted by system processes and/or interrupt handlers, which will add some noise to your timer.

Threads: How to calculate precisely the execution time of an algorithm (duration of function) in C or C++?

There is easy way to calc duration of any function which described here: How to Calculate Execution Time of a Code Snippet in C++
start_timestamp = get_current_uptime();
// measured algorithm
duration_of_code = get_current_uptime() - start_timestamp;
But, it does not allow to get clear duration cause some time for execution other threads will be included in the measured time.
So question is: how to consider time which code spend in other threads?
OSX code preffer. Although it's great to look to windows or linux code also...
upd: Ideal? concept of code
start_timestamp = get_this_thread_current_uptime();
// measured algorithm
duration_of_code = get_this_thread_current_uptime() - start_timestamp;

I'm sorry to say that in the general case there is no way to do what you want. You are looking for worst-case execution time, and there are several methods to get a good approximation for this, but there is no perfect way as WCET is equivalent to the Halting problem.

If you want to exclude the time spent in other threads then you could disable task context switches upon entering the function that you want to measure. This is RTOS dependent but one possibility is to raise the priority of the current thread to the maximum. If this thread is max priority then other threads won't be able to run. Remember to reset the thread priority again at the end of the function. This measurement may still include the time spent in interrupts, however.
Another idea is to disable interrupts altogether. This could remove other threads and interrupts from your measurement. But with interrupts disabled the timer interrupt may not function properly. So you'll need to setup a hardware timer appropriately and rely on the timer's counter value register (rather than any time value derived from a timer interrupt) to measure the time. Also make sure your function doesn't call any RTOS routines that allow for a context switch. And remember to restore interrupts at the end of your function.
Another idea is to run the function many times and record the shortest duration measured over those many times. Longer durations probably include time spent in other threads but the shortest duration may be just the function with no other threads.
Another idea is to set a GPIO pin upon entry to and clear it upon exit from the function. Then monitor the GPIO pin with an oscilloscope (or logic analyzer). Use the oscilloscope to measure the period for when the GPIO pin is high. In order to remove the time spent in other threads you would need to modify the RTOS scheduler routine that selects the thread to run. Clear the GPIO pin in the scheduler when another thread runs and set it when the scheduler returns to your function's thread. You might also consider clearing the GPIO pin in interrupt handlers.

Your question is entirely OS dependent. The only way you can accomplish this is to somehow get a guarantee from the OS that it won't preempt your process to perform some other task, and to my knowledge this is simply not possible in most consumer OS's.
RTOS often do provide ways to accomplish this though. With Windows CE, anything running at priority 0 (in theory) won't be preempted by another thread unless it makes a function/os api/library call that requires servicing from another thread.
I'm not super familer with OSx, but after glancing at the documentation, OSX is a "soft" realtime operating system. This means that technically what you want can't be guaranteed. The OS may decide that there is "Something" more important than your process that NEEDS to be done.
OSX does however allow you to specify a Real-time process which means the OS will make every effort to honor your request to not be interrupted and will only do so if it deems absolutely necessary.
Mac OS X Scheduling documentation provides examples on how to set up real-time threads

OSX is not an RTOS, so the question is mistitled and mistagged.
In a true RTOS you can lock the scheduler, disable interrupts or raise the task to the highest priority (with round-robin scheduling disabled if other tasks share that priority) to prevent preemption - although only interrupt disable will truly prevent preemption by interrupt handlers. In a GPOS, even if it has a priority scheme, that normally only controls the number of timeslices allowed to a process in what is otherwise round-robin scheduling, and does not prevent preemption.
One approach is to make many repeated tests and take the smallest value obtained, since that is likely to be the one where the fewest pre-emptions occurred. It will help also to set the process to the highest priority in order to minimise the number of preemtions. But bear in mind on a GPOS many interrupts from devices such as the mouse, keyboard, and system clock will occur and consume a small (an possibly negligible) amount of time.

Can my thread help the OS decide when to context switch it out?

I am working on a threaded application on Linux in C++ which attempts to be real time, doing an action on a heartbeat, or as close to it as possible.
In practice, I find the OS is swapping out my thread and causing delays of up to a tenth of a second while it is switched out, causing the heartbeat to be irregular.
Is there a way my thread can hint to the OS that now is a good time to context switch it out? I could make this call right after doing a heartbeat, and thus minimize the delay due to an ill timed context switch.

It is hard to say what the main problem is in your case, but it is most certainly not something that can be corrected with a call to sched_yield() or pthread_yield(). The only well-defined use for yielding, in Linux, is to allow a different ready thread to preempt the currently CPU-bound running thread at the same priority on the same CPU under SCHED_FIFO scheduling policy. Which is a poor design decision in almost all cases.
If you're serious about your goal of "attempting to be real-time" in Linux, then first of all, you should be using a real-time sched_setscheduler setting (SCHED_FIFO or SCHED_RR, FIFO preferred).
Second, get the full preemption patch for Linux (from kernel.org if your distro does not supply one. It will also give you the ability to reschedule device driver threads and to execute your thread higher than, say, hard disk or ethernet driver threads.
Third, see RTWiki and other resources for more hints on how to design and set up a real-time application.
This should be enough to get you under 10 microseconds response time, regardless of system load on any decent desktop system. I have an embedded system where I only squeeze out 60 us response idle and 150 us under heavy disk/system load, but it's still orders of magnitude faster than what you're describing.

You can tell the current executing thread to pause execution with various commands such as yield.
Just telling the thread to pause is non-determanistic, 999 times it might provide good intervals and 1 time it doesn't.
You'll will probably want to look at real time scheduling for consistant results. This site http://www2.net.in.tum.de/~gregor/docs/pthread-scheduling.html seems to be a good starting spot for researching about thread scheduling.

use sched_yield
And fur threads there is an pthread_yield http://www.kernel.org/doc/man-pages/online/pages/man3/pthread_yield.3.html

I'm a bit confused by the question. If your program is just waiting on a periodic heartbeat and then doing some work, then the OS should know to schedule other things when you go back to waiting on the heartbeat.
You aren't spinning on a flag to get your "heartbeat" are you?

You are using a timer function such as setitimer(), right? RIGHT???
If not, then you are doing it all wrong.
You may need to specify a timer interval that is just a little shorter than what you really need. If you are using a real-time scheduler priority and a timer, your process will almost always be woken up on time.
I would say always on time, but Linux isn't a perfect real-time OS yet.

I'm not too sure for Linux, but on Windows it's been explained that you can't ask the system to not interrupt you for several reasons (first paragraph mostly). Off my head, one of the reasons is hardware interrupts that can occur at any time and over which you have no control.
EDIT Some guy just suggested the use of sched_yield then deleted his answer. It'll relinquish time for your whole process though. You can also use sched_setscheduler to hint the kernel about what you need.

Best way to slow down a thread? Is using Sleep() OK?

I've written a C++ library that does some seriously heavy CPU work (all of it math and calculations) and if left to its own devices, will easily consume 100% of all available CPU resources (it's also multithreaded to the number of available logical cores on the machine).
As such, I have a callback inside the main calculation loop that software using the library is supposed to call:
while(true)
{
//do math here
callback(percent_complete);
}
In the callback, the client calls Sleep(x) to slow down the thread.
Originally, the clientside code was a fixed Sleep(100) call, but this led to bad unreliable performance because some machines finish the math faster than others, but the sleep is the same on all machines. So now the client checks the system time, and if more than 1 second has passed (which == several iterations), it will sleep for half a second.
Is this an acceptable way of slowing down a thread? Should I be using a semaphore/mutex instead of Sleep() in order to maximize performance? Is sleeping x milliseconds for each 1 second of processing work fine or is there something wrong that I'm not noticing?
The reason I ask is that the machine still gets heavily bogged down even though taskman shows the process taking up ~10% of the CPU. I've already explored hard disk and memory contention to no avail, so now I'm wondering if the way I'm slowing down the thread is causing this problem.
Thanks!

Why don't you use a lower priority for the calculation threads? That will ensure other threads are scheduled while allowing your calculation threads to run as fast as possible if no other threads need to run.

What is wrong with the CPU at 100%? That's what you should strive for, not try to avoid. These math calculations are important, no? Unless you're trying to avoid hogging some other resource not explicitly managed by the OS (a mutex, the disk, etc) and used by the main thread, generally trying to slow your thread down is a bad idea. What about on multicore systems (which almost all systems will be, going forward)? You'd be slowing down a thread for absolutely no reason.
The OS has a concept of a thread quantum. It will take care of ensuring that no important thread on your system is starved. And, as I mentioned, on multicore systems spiking one thread on one CPU does not hurt performance for other threads on other cores at all.
I also see in another comment that this thread is also doing a lot of disk I/O - these operations will already cause your thread to yield while it's waiting for the results, so the sleeps will do nothing.
In general, if you're calling Sleep(x), there is something wrong/lazy with your design, and if x==0, you're opening yourself up to live locks (the thread calling Sleep(0) can actually be rescheduled immediately, making it a noop).

Sleep should be fine for throttling an app, which from your comments is what you're after. Perhaps you just need to be more precise how long you sleep for.
The only software in which I use a feature like this is the BOINC client. I don't know what mechanism it uses, but it's open-source and multi-platform, so help yourself.
It has a configuration option ("limit CPU use to X%"). The way I'd expect to implement that is to use platform-dependent APIs like clock() or GetSystemTimes(), and compare processor time against elapsed wall clock time. Do a bit of real work, check whether you're over or under par, and if you're over par sleep for a while to get back under.
The BOINC client plays nicely with priorities, and doesn't cause any performance issues for other apps even at 100% max CPU. The reason I use the throttle it is that otherwise, the client runs the CPU flat-out all the time, and drives up the fan speed and noise. So I run it at the level where the fan stays quiet. With better cooling maybe I wouldn't need it :-)

Another, not so elaborate, method could be to time one iteration and let the thread sleep for (x * t) milliseconds before the next iteration where t is the millisecond time for one iteration and x is the choosen sleep time fraction (between 0 and 1).

Have a look at cpulimit. It sends SIGSTOP and SIGCONT as required to keep a process below a given CPU usage percentage.
Even still, WTF at "crazy complaints and outlandish reviews about your software killing PC performance". I'd be more likely to complain that your software was slow and not making the best use of my hardware, but I'm not your customer.
Edit: on Windows, SuspendThread() and ResumeThread() can probably produce similar behaviour.

Sleep() becomes less accurate after replacing a PC? (C++)

I have a program that was built in C++ (MFC, Visual Studio 6.0) several years ago and has been running on a certain Windows machine for quite some time (more than 5 years). The PC was replaced a month ago (the old one died), and since then the program's timing behavior changed. I need help understanding why.
The main functionality of the program is to respond to keystrokes by sending out ON and OFF signals to an external card, with very accurate delay between the ON and the OFF. An example program flow:
> wait for keystroke...
> ! keystroke occurred
> send ON message
> wait 150ms
> send OFF message
Different keystrokes have different waiting periods associated with them, between 20ms and 150ms (a very deterministic time depending on the specific keystroke). The timing is very important. The waiting is executed using simple Sleep(). The accuracy of the sleep on the old PC was 1-2ms deviation. I can measure the timing externally to the computer (on the external card), so my measurement of the sleep time is very accurate. Please take into account this machine executed such ON-sleep-OFF cycles thousands of times a day for years, so the accuracy data I have is sound.
Since the PC was replaced the timing deviation is more than 10ms.
I did not install the previous PC, so it may had some additional software packages installed. Also, I'm ashamed to admit I don't remember whether the previous PC was Windows 2000 or Windows XP. I'm quite sure it was XP, but not 100% (and I can't check now...). The new one is Windows XP.
I tried changing the sleeping mechanism to be based on timers, but the accuracy did not improve.
Can anything explain this change? Is there a software package that may have been installed on the previous PC that may fix the problem? Is there a best practice to deal with the problem?

The time resolution on XP is around 10ms - the system basically "ticks" every 10ms. Sleep is not a very good way to do accurate timing for that reason. I'm pretty sure win2000 has the same resolution but if I'm wrong that could be a reason.
You can change that resolution , atleast down to 1ms - see http://technet.microsoft.com/en-us/sysinternals/bb897569.aspx or use this http://www.lucashale.com/timerresolution/ - there's probably a registry key as well(windows media player will change that timer as well, probably only while it's running.
Could be the resolution somehow was altered on your old machine.

If your main concern is precision, consider using spinlock. Sleep() function is a hint for the scheduler to not to re-schedule the given thread for at least x ms, there's no guarantee that the thread will sleep exactly for the time specified.

Usually Sleep() will result in delay of ~15 ms or period multiple by ~15ms depending on sleep value.
On of the good ways to find out haw it works is the following pseudo-code:
while true do
print(GetTickCount());
Sleep(1);
end;
And also it will show that the behavior of this code is different for, say, Windows XP and Vista/Win 7

As others have mentioned, sleep has coarse accuracy.
I typically use Boost::asio for this kind of timing:
// Set up the io_service and deadline_timer
io_service io_
deadline_timer timer(io_service);
// Configure the wait period
timer.expires_from_now(posix_time::millisec(5));
timer.wait();
Asio uses the most effective implementation for your platform; on Windows I believe it uses overlapped IO.
If I set the time period to 1ms and loop the "timer." calls 10000 times the duration is typically about 10005-10100 ms. Very accurate, cross platform code (though accuracy is different on Linux) and very easy to read.
I can't explain why your previous PC was so accurate though; Sleep has been +/- 10ms whenever I've used it - worse if the PC is busy.

Is your new PC multi-core and the old one single-core? The difference in timing accuracy may be the use of multiple threads and context switching.

Sleep is dependent on the system clock. Your new machine probably has a different timing than your previous machine. From the documentation:
This function causes a thread to
relinquish the remainder of its time
slice and become unrunnable for an
interval based on the value of
dwMilliseconds. The system clock
"ticks" at a constant rate. If
dwMilliseconds is less than the
resolution of the system clock, the
thread may sleep for less than the
specified length of time. If
dwMilliseconds is greater than one
tick but less than two, the wait can
be anywhere between one and two ticks,
and so on. To increase the accuracy of
the sleep interval, call the
timeGetDevCaps function to determine
the supported minimum timer resolution
and the timeBeginPeriod function to
set the timer resolution to its
minimum. Use caution when calling
timeBeginPeriod, as frequent calls can
significantly affect the system clock,
system power usage, and the scheduler.
If you call timeBeginPeriod, call it
one time early in the application and
be sure to call the timeEndPeriod
function at the very end of the
application.
The documentation seems to imply that you can attempt to make it more accurate, but I wouldn't try that if I were you. Just use a timer.
What timers did you replace it with? If you used SetTimer(), that timer sucks too.
The correct solution is to use the higher-resolution TimerQueueTimer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js