How to decrease CPU usage of high resolution (10 micro second) precise timer?

How to decrease CPU usage of high resolution (10 micro second) precise timer? - c++

I'm writing up a timer for some complex communication application in windows 10 with qt5 and c++. I want to use max 3 percent of CPU with micro second resolution.
Initially i used qTimer (qt5) in this app. It was fine with low CPU usage and developer friendly interface. But It was not precise as i need.It takes only millisecond as parameter but i need microsecond. And the accuracy of the timer wasn't equal this resolution in many real-world situations like heavy load on cpu. Sometimes the timer fires at 1 millisecond, sometimes 15 millisecond. You can see this problem in picture:
I searched a solution for days. But in the end i found Windows is a non real-time Operating System (RTOS) and don't give high resolution and precise timer.
I wrote my own High resolution precise timer with CPU polling for this goal. I developed a singleton class working in separate thread. It works at 10 micro second resolution.
But it is consuming one logical core in CPU. Equivalent to 6.25 percent at ryzen 2700.
For my application this CPU usage is unacceptable. How can i reduce this CPU usage without give high resolution away ?
This is the code that does the job:
void CsPreciseTimerThread::run()
{
while (true)
{
QMutexLocker locker(&mMutex);
for (int i=0;i<mTimerList.size();i++)
{
CsPreciseTimerMiddleLayer* timer = mTimerList[i];
int interval = timer->getInterval();
if ( (timer->isActive() == true&&timer->remainingTime()<0))
{
timer->emitTimeout();
timer->resetTime();
}
}
}
}
I tried to down priority of timer thread. I used this lines:
QThread::start(QThread::Priority::LowestPriority);
And this:
QThread::start(QThread::Priority::IdlePriority);
That changes makes timer less precise but CPU usage didn't decrease.
After that i tried force the current thread to sleep for few microseconds in loop.
QThread::usleep(15);
As you might guess sleep function did screw up the accuracy. Sometimes timer sleeps longer than expected , like 10 ms or 15 ms.

I'm going to reference Windows APIs directly instead of the Qt abstractions.
I don't think you want to lower your thread priority, I think you want to raise your thread priority and use the smallest amount of Sleep between polling to balance between latency and CPU overhead.
Two ideas:
In Windows Vista, they introduced the Multimedia Class Scheduler Service specifically so that they could move the Windows audio components out of kernel mode and running in user mode, without impacting pro-audio tools. That's probably going to be helpful to you - it's not precisesly "real time" guararteed, but it's meant for low latency operations.
Going the classic way - raise your process and thread priority to high or critical, while using a reasonable sleep statement of a few milliseconds. That is, raise your thread priority to THREAD_PRIORITY_TIME_CRITICAL. Then do a very small Sleep after completion of the for loop. This sleep amount should be between 0..10 milliseconds. Some experimentation required, but I would sleep no more than half the time to the next expected timeout, with a max of 10ms. And when you are within N microseconds of your timer, you might need to just spin instead of yielding. Some experimentation is required. You can also experiment with raising your Process priority to REALTIME_PRIORITY_CLASS.
Be careful - A handful of runaway processes and threads at these higher priority levels that isn't sleeping can lock up the system.

Related

High resolution periodic timer in Qt on Windows (also OS X, Linux)

Everything I've found so far regarding timers is that it's, at best, available at a 1ms resolution. QTimer's docs claim that's the best it can provide.
I understand that OSes like Windows are not real-time OSes, but I still want to ask this question in hopes that someone knows something that could help.
So, I'm writing an app that requires a function to be called at a fairly precise but arbitrary interval, say 60 times/sec (full range: 59-61Hz). That means I need it to be called, on average, every ~16.67ms. This part of the design can't change.
The best timing source I currently have is vsync. When I go off of that, it's pretty good. It's not ideal, because the monitor's frequency is not exactly what I need to call this function at, but it can be somewhat compensated for.
The kicker is that the level of accuracy given the range I'm after is more or less available with timers, but not the level of precision I want. I can get a 16ms timer to hit exactly 16ms ~97% of the time. I can get a 17ms timer to hit exactly 17ms ~97% of the time. But no API exists to get me 16.67?
Is what I'm looking for simply not possible?
Background: The project is called Phoenix. Essentially, it's a libretro frontend. Libretro "cores" are game console emulators encapsulated in individual shared libraries. The API function being called at a specific rate is retro_run(). Each call emulates a game frame and calls callbacks for audio, video and so on. In order to emulate at a console's native framerate, we must call retro_run() at exactly (or as close to) this rate, hence the timer.

You could write a loop that checks std::chrono::high_resolution_clock() and std::this_thread::yield() until the right time has elapsed. If the program needs to be responsive while this is going on, you should do it in a separate thread from the one checking the main loop.
Some example code:
http://en.cppreference.com/w/cpp/thread/yield
An alternative is to use QElapsedTimer with a value of PerformanceCounter. You will still need to check it from a loop, and probably will still want to yield within that loop. Example code: http://doc.qt.io/qt-4.8/qelapsedtimer.html

It is completely unnecessary to call retro_run at any highly controlled time in particular, as long as the average frame rate comes out right, and as long as your audio output buffers don't underflow.
First of all, you are likely to have to measuring the real time using an audio-output-based timer. Ultimately, each retro_run produces a chunk of audio. The audio buffer state with the chunk added is your timing reference: if you run early, the buffer will be too full, if you run late, the buffer will be too empty.
This error measure can be fed into a PI controller, whose output gives you the desired delay until the next invocation of retro_run. This will automatically ensure that your average rate and phase are correct. Any systematic latencies in getting retro_run active will be integrated away, etc.
Secondly, you need a way of waking yourself up at the correct moment in time. Given a target time (in terms of a performance counter, for example) to call retro_run, you'll need a source of events that wake your code up so that you can compare the time and retro_run when necessary.
The simplest way of doing this would be to reimplement QCoreApplication::notify. You'll have a chance to retro_run prior to the delivery of every event, in every event loop, in every thread. Since system events might not otherwise come often enough, you'll also want to run a timer to provide a more dependable source of events. It doesn't matter what the events are: any kind of event is good for your purpose.
I'm not familiar with threading limitations of retro_run - perhaps you can run it in any one thread at a time. In such case, you'd want to run it on the next available thread in a pool, perhaps with the exception of the main thread. So, effectively, the events (including timer events) are used as energetically cheap sources of giving you execution context.
If you choose to have a thread dedicated to retro_run, it should be a high priority thread that simply blocks on a mutex. Whenever you're ready to run retro_run when a well-timed event comes, you unlock the mutex, and the thread should be scheduled right away, since it'll preempt most other threads - and certainly all threads in your process.
OTOH, on a low core count system, the high priority thread is likely to preempt the main (gui) thread, so you might as well invoke retro_run directly from whatever thread got the well-timed event.
It might of course turn out that using events from arbitrary threads to wake up the dedicated thread introduces too much worst-case latency or too much latency spread - this will be system-specific and you may wish to collect runtime statistics, switch threading and event source strategies on the fly, and stick with the best one. The choices are:
retro_run in a dedicated thread waiting on a mutex, unlock source being any thread with a well-timed event caught via notify,
retro_run in a dedicated thread waiting for a timer (or any other) event; events still caught via notify,
retro_run in a gui thread, unlock source being the events delivered to the gui thread, still caught via notify,
any of the above, but using timer events only - note that you don't care which timer events they are, they don't need to come from your timer,
as in #4, but selective to your timer only.

My implementation based on Lorehead's answer. Time for all variables are in ms.
It of course needs a way to stop running and I was also thinking about subtracting half the (running average) difference between timeElapsed and interval to make the average +-n instead of +2n, where 2n is the average overshoot.
// Typical interval value: 1/60s ~= 16.67ms
void Looper::beginLoop( double interval ) {
QElapsedTimer timer;
int counter = 1;
int printEvery = 240;
int yieldCounter = 0;
double timeElapsed = 0.0;
forever {
if( timeElapsed > interval ) {
timer.start();
counter++;
if( counter % printEvery == 0 ) {
qDebug() << "Yield() ran" << yieldCounter << "times";
qDebug() << "timeElapsed =" << timeElapsed << "ms | interval =" << interval << "ms";
qDebug() << "Difference:" << timeElapsed - interval << " -- " << ( ( timeElapsed - interval ) / interval ) * 100.0 << "%";
}
yieldCounter = 0;
importantBlockingFunction();
// Reset the frame timer
timeElapsed = ( double )timer.nsecsElapsed() / 1000.0 / 1000.0;
}
timer.start();
// Running this just once means massive overhead from calling timer.start() so many times so quickly
for( int i = 0; i < 100; i++ ) {
yieldCounter++;
QThread::yieldCurrentThread();
}
timeElapsed += ( double )timer.nsecsElapsed() / 1000.0 / 1000.0;
}
}

How to repeat a process ( or to set a period of process) in linux?

I have a process that does something and needs to be repeated after a period of 1ms. How can I set period of a process on linux ?
I am using linux 3.2.0-4-rt-amd64 (with RT-Preempt patch) on Intel i7 -2600 CPU (total 8 cores) # 3.40 Ghz.
Basically I have about 6 threads in while loop shown in code and I want threads to be executed at every 1 ms. At the end I want to measure latency of each thread.
So How to set the period 1ms ?
for example in following code, how can I repeat Task1 after every 1ms ?
while(1){
//Task1(having threads)
}
Thank you.

A call to usleep(1000) inside the while loop will do the job, i.e.:
while (1) {
// Task1
usleep(1000); // 1000 microseconds = 1 millisecond
}
EDIT
Since usleep() is already deprecated in favor of nanosleep(), let's use the latter instead:
struct timespec timer;
timer.tv_sec = 0;
timer.tv_nsec = 1000000L;
while (1) {
// Task1
nanosleep(&timer, NULL);
}

Read time(7).
One millisecond is really a small period of time. (Can't you bear with e.g. a ten milliseconds delay?). I'm not sure regular processes on regular Linux on common laptop hardware are able to deal reliably with such a small period. Maybe you need RTLinux or at least real time scheduling (see sched_setscheduler(2) and this question) and perhaps a specially configured recent 3.x kernel
You can't be sure that your processing (inside your loop) is smaller than a millisecond.
You should explain what is your application doing, and what happens inside the loop.
You might have some event loop, consider using ppoll(2), timer_create(2) (see also timer_getoverrun(2)...) and/or timerfd_create(2) and clock_nanosleep(2)
(I would try something using ppoll and timerfd_create but I would accept some millisecond ticks to be skipped)
You should tell us more about your hardware and your kernel. I'm not even sure my desktop i3770K processor, asus P8Z77V motherboard, (3.13.3 PREEMPT Linux kernel) is able to reliably deal with a single millisecond delay.
(Of course, a plain loop simply calling clock_nanosleep, or better yet, using timerfd_create with ppoll, will usually do the job. But that is not reliable...)

Threads are slow when audio is off

I have 2 projects. One is built by C++ Builder without MFC Style. And other one is VC++ MFC 11.
When I create a thread and create a cycle -- let's say this cycle adds one to progressbar position -- from 1 to 100 by using Sleep(10) it works of course for both C++ Builder and C++ MFC.
Now, Sleep(10) is wait 10 miliseconds. OK. But the problem is only if I have open media player, Winamp or anything else that produces "Sound". If I close all media player, winamp and other sound programs, my threads get slower than 10 miliseconds.
It takes like 50-100 ms / each. If I open any music, it works normally as I expected.
I have no any idea why this is happening. I first thought that I made a mistake inside MFC App but why does C++ Builder also slow down?
And yes, I am positively sure it is sound related because I even re-formated my windows, disabled everything. Lastly I discovered that sound issue.
Does my code need something?
Update:
Now, I follow the code and found that I used Sleep(1) in such areas to wait 1 miliseconds. The reason of this, I move an object from left to right. If I remove this sleep then the moving is not showing up because it is very fast. So, I should use Sleep(1). With Sleep(1), if audio is on than it works. If audio is off than it is very slow.
for (int i = 0; i <= 500; i++) {
theDialog->staticText->SetWindowsPosition(NULL, i, 20, 0, 0);
Sleep(1);
}
So, suggestions regarding this are really appreciated. What should I do?
I know this is the incorrect way. I should use something else that is proper and valid. But what exactly? Which function or class help me to move static texts from one position to another smoothly?
Also, changing the thread priority has not helped.
Update 2:
Update 1 is an another question :)

Sleep (10), will (as we know), wait for approximately 10 milliseconds. If there is a higher priority thread which needs to be run at that moment, the thread wakeup maybe delayed. Multimedia threads are probably running in a Real-Time or High priority, as such when you play sound, your thread wakeup gets delayed.
Refer to Jeffrey Richters comment in Programming Applications for Microsoft Windows (4th Ed), section Sleeping in Chapter 7:
The system makes the thread not schedulable for approximately the
number of milliseconds specified. That's right—if you tell the system
you want to sleep for 100 milliseconds, you will sleep approximately
that long but possibly several seconds or minutes more. Remember that
Windows is not a real-time operating system. Your thread will probably
wake up at the right time, but whether it does depends on what else is
going on in the system.
Also as per MSDN Multimedia Class Scheduler Service (Windows)
MMCSS ensures that time-sensitive processing receives prioritized access to CPU resources.
As per the above documentation, you can also control the percentage of CPU resources that will be guaranteed to low-priority tasks, through a registry key

Sleep(10) waits for at least 10 milliseconds. You have to write code to check how long you actually waited and if it's more than 10 milliseconds, handle that sanely in your code. Windows is not a real time operating system.

The minimum resolution for Sleep() timing is set system wide with timeBeginPeriod() and timeEndPeriod(). For example passing timeBeginPeriod(1) sets the minimum resolution to 1 ms. It may be that the audio programs are setting the resolution to 1 ms, and restoring it to something greater than 10 ms when they are done. I had a problem with a program that used Sleep(1) that only worked fine when the XE2 IDE was running but would otherwise sleep for 12 ms. I solved the problem by directly setting timeBeginPeriod(1) at the beginning of my program.
See: http://msdn.microsoft.com/en-us/library/windows/desktop/dd757624%28v=vs.85%29.aspx

Qt Timer Problem

I have a timer whose tick time is 100. but it tick 125 msec.So i reduced the tick time from 100 to 80, but i still tick approximately 125 msec again. This timer is in main thread. How can i solve this problem? and i m open any suggestions.
Any help will be appreciated.

See http://doc.qt.nokia.com/4.2/qtimer.html
.... a timer cannot fire while your
application is busy doing something
else. In other words: the accuracy of
timers depends on the granularity of
your application.
and
Note that QTimer's accuracy depends
on the underlying operating system and
hardware. ... If Qt is unable to
deliver the requested number of timer
clicks, it will silently discard some.
NOTE: Some older version of Qt use other api that give 20-50ms accuracy.
All non-realtime OS give no guarantee on sleep time and it depends on your cpu power and how bust your system is, you should never relay on this.

Concurrency question about program running in OS

Here is what I know about concurrency in OS.
In order to run multi-task in an OS, the CPU will allocate a time slot to each task. When doing task A, other task will "sleep" and so on.
Here is my question:
I have a timer program that count for inactivity of keyboard / mouse. If inactivity continues within 15min, a screen saver program will popup.
If the concurrency theory is as I stated above, then the timer will be inaccurate? Because each program running in OS will have some time "sleep", then the timer program also have chance "sleeping", but in the real world the time is not stop.

You would use services from the OS to provide a timer you would not try to implement yourself. If code had to run simple to count time we would still be in the dark ages as far as computing is concerned.

In most operating systems, your task will not only be put to sleep when its time slice has been used but also while it is waiting for I/O (which is much more common for most programs).
Like AnthonyWJones said, use the operating system's concept of the current time.
The OS kernel's time slices are much too short to introduce any noticeable inaccuracy for a screen saver.

I think your waiting process can be very simple:
activityTime = time of last last keypress or mouse movement [from OS]
now = current time [from OS]
If now >= 15 mins after activityTime, start screensaver
sleep for a few seconds and return to step 1
Because steps 1 and 2 use the OS and not some kind of running counter, you don't care if you get interrupted anytime during this activity.

This could be language-dependent. In Java, it's not a problem. I suspect that all languages will "do the right thing" here. That's with the caveat that such timers are not extremely accurate anyway, and that usually you can only expect that your timer will sleep at least as long as you specify, but might sleep longer. That is, it might not be the active thread when the time runs out, and would therefore resume processing a little later.

See for example http://www.opengroup.org/onlinepubs/000095399/functions/sleep.html
The suspension time may be longer than requested due to the scheduling of other activity by the system.

The time you specify in sleep() is in realtime, not the cpu time your process uses. (As the CPU time is approximately 0 while your program sleeps.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js