High resolution periodic timer in Qt on Windows (also OS X, Linux) - c++

Everything I've found so far regarding timers is that it's, at best, available at a 1ms resolution. QTimer's docs claim that's the best it can provide.
I understand that OSes like Windows are not real-time OSes, but I still want to ask this question in hopes that someone knows something that could help.
So, I'm writing an app that requires a function to be called at a fairly precise but arbitrary interval, say 60 times/sec (full range: 59-61Hz). That means I need it to be called, on average, every ~16.67ms. This part of the design can't change.
The best timing source I currently have is vsync. When I go off of that, it's pretty good. It's not ideal, because the monitor's frequency is not exactly what I need to call this function at, but it can be somewhat compensated for.
The kicker is that the level of accuracy given the range I'm after is more or less available with timers, but not the level of precision I want. I can get a 16ms timer to hit exactly 16ms ~97% of the time. I can get a 17ms timer to hit exactly 17ms ~97% of the time. But no API exists to get me 16.67?
Is what I'm looking for simply not possible?
Background: The project is called Phoenix. Essentially, it's a libretro frontend. Libretro "cores" are game console emulators encapsulated in individual shared libraries. The API function being called at a specific rate is retro_run(). Each call emulates a game frame and calls callbacks for audio, video and so on. In order to emulate at a console's native framerate, we must call retro_run() at exactly (or as close to) this rate, hence the timer.

You could write a loop that checks std::chrono::high_resolution_clock() and std::this_thread::yield() until the right time has elapsed. If the program needs to be responsive while this is going on, you should do it in a separate thread from the one checking the main loop.
Some example code:
http://en.cppreference.com/w/cpp/thread/yield
An alternative is to use QElapsedTimer with a value of PerformanceCounter. You will still need to check it from a loop, and probably will still want to yield within that loop. Example code: http://doc.qt.io/qt-4.8/qelapsedtimer.html

It is completely unnecessary to call retro_run at any highly controlled time in particular, as long as the average frame rate comes out right, and as long as your audio output buffers don't underflow.
First of all, you are likely to have to measuring the real time using an audio-output-based timer. Ultimately, each retro_run produces a chunk of audio. The audio buffer state with the chunk added is your timing reference: if you run early, the buffer will be too full, if you run late, the buffer will be too empty.
This error measure can be fed into a PI controller, whose output gives you the desired delay until the next invocation of retro_run. This will automatically ensure that your average rate and phase are correct. Any systematic latencies in getting retro_run active will be integrated away, etc.
Secondly, you need a way of waking yourself up at the correct moment in time. Given a target time (in terms of a performance counter, for example) to call retro_run, you'll need a source of events that wake your code up so that you can compare the time and retro_run when necessary.
The simplest way of doing this would be to reimplement QCoreApplication::notify. You'll have a chance to retro_run prior to the delivery of every event, in every event loop, in every thread. Since system events might not otherwise come often enough, you'll also want to run a timer to provide a more dependable source of events. It doesn't matter what the events are: any kind of event is good for your purpose.
I'm not familiar with threading limitations of retro_run - perhaps you can run it in any one thread at a time. In such case, you'd want to run it on the next available thread in a pool, perhaps with the exception of the main thread. So, effectively, the events (including timer events) are used as energetically cheap sources of giving you execution context.
If you choose to have a thread dedicated to retro_run, it should be a high priority thread that simply blocks on a mutex. Whenever you're ready to run retro_run when a well-timed event comes, you unlock the mutex, and the thread should be scheduled right away, since it'll preempt most other threads - and certainly all threads in your process.
OTOH, on a low core count system, the high priority thread is likely to preempt the main (gui) thread, so you might as well invoke retro_run directly from whatever thread got the well-timed event.
It might of course turn out that using events from arbitrary threads to wake up the dedicated thread introduces too much worst-case latency or too much latency spread - this will be system-specific and you may wish to collect runtime statistics, switch threading and event source strategies on the fly, and stick with the best one. The choices are:
retro_run in a dedicated thread waiting on a mutex, unlock source being any thread with a well-timed event caught via notify,
retro_run in a dedicated thread waiting for a timer (or any other) event; events still caught via notify,
retro_run in a gui thread, unlock source being the events delivered to the gui thread, still caught via notify,
any of the above, but using timer events only - note that you don't care which timer events they are, they don't need to come from your timer,
as in #4, but selective to your timer only.

My implementation based on Lorehead's answer. Time for all variables are in ms.
It of course needs a way to stop running and I was also thinking about subtracting half the (running average) difference between timeElapsed and interval to make the average +-n instead of +2n, where 2n is the average overshoot.
// Typical interval value: 1/60s ~= 16.67ms
void Looper::beginLoop( double interval ) {
QElapsedTimer timer;
int counter = 1;
int printEvery = 240;
int yieldCounter = 0;
double timeElapsed = 0.0;
forever {
if( timeElapsed > interval ) {
timer.start();
counter++;
if( counter % printEvery == 0 ) {
qDebug() << "Yield() ran" << yieldCounter << "times";
qDebug() << "timeElapsed =" << timeElapsed << "ms | interval =" << interval << "ms";
qDebug() << "Difference:" << timeElapsed - interval << " -- " << ( ( timeElapsed - interval ) / interval ) * 100.0 << "%";
}
yieldCounter = 0;
importantBlockingFunction();
// Reset the frame timer
timeElapsed = ( double )timer.nsecsElapsed() / 1000.0 / 1000.0;
}
timer.start();
// Running this just once means massive overhead from calling timer.start() so many times so quickly
for( int i = 0; i < 100; i++ ) {
yieldCounter++;
QThread::yieldCurrentThread();
}
timeElapsed += ( double )timer.nsecsElapsed() / 1000.0 / 1000.0;
}
}

Related

How to efficiently handle incoming delayed events on a single timeline?

I want to implement the algorithm that awaits for some events and handles them after some delay. Each event has it's own predefined delay. The handler may be executed in a separate thread. The issues with the CPU throttling, the host overload, etc. may be ignored - it's not intended to be a precise real-time system.
Example.
At moment N arrives an event with delay 1 second. We want to handle it at moment N + 1 sec.
At moment N + 0.5 sec arrives another event with delay 0.3 seconds. We want to handle it at moment N + 0.8 sec.
Approaches.
The only straightforward approach that comes to my mind is to use a loop with minimal possible delay inbetween iterations, like every 10 ms, and check if any event on our timeline should be handled now. But it's not a good idea since the delays may vary on scale from 10 ms to 10 minutes.
Another approach is to have a single thread that sleeps between events. But I can't figure out how to forcefully "wake" it when there is a new event that should be handled between now and the next scheduled wake up.
Also it's possible to use a thread per event and just sleep, but there may be thousands of simultanious events which effectively may lead to running out of threads.
The solution can be language-agnostic, but I prefer the C++ STD library solution.
Another approach is to have a single thread that sleeps between events. But I can't figure out how to forcefully "wake" it when there is a new event that should be handled between now and the next scheduled wake up.
I suppose solution to these problems are, at least on *nix systems, poll or epoll with some help of timer. It allows you to make the thread sleep until some given event. The given event may be something appearing on stdin or timer timeout. Since the question was about a general algorithm/idea of algorithm and the code would take a lot of space I am giving just pseudocode:
epoll = create_epoll();
timers = vector<timer>{};
while(true) {
event = epoll.wait_for_event(timers);
if (event.is_timer_timeout()) {
t = timers.find_timed_out();
t.handle_event();
timers.erase(t);
} else if (event.is_incoming_stdin_data()) {
data = stdin.read();
timers.push_back(create_timer(data));
}
}
Two threads that share a priority queue.
Arrivals thread: Wait for arrival. When event arrives calculate time for handler to run. Add handler to queue with priority of handler time ( the top of the queue will be the next event that is to be handled
Handler thread: Is now equal to time of handler at top of queue then run handler. Sleep for clock resolution.
Note: check if your queue is thread safe. If not, then you will have to use a mutex.
This looks simple, but there a lot of gotchas waiting for the inexperienced. So, I would not recommend coding this from scratch. It is better to use a library. The classic is boost::asio. However, this is beginning to show its age and has way more bells and whistles than are needed. So, personally, I use something more lightweight and coded in C++17 - a non blocking event waiter class I coded that you can get from https://github.com/JamesBremner/await. Notice the sample application using this class which does most of what you require https://github.com/JamesBremner/await/wiki/Event-Server

How to decrease CPU usage of high resolution (10 micro second) precise timer?

I'm writing up a timer for some complex communication application in windows 10 with qt5 and c++. I want to use max 3 percent of CPU with micro second resolution.
Initially i used qTimer (qt5) in this app. It was fine with low CPU usage and developer friendly interface. But It was not precise as i need.It takes only millisecond as parameter but i need microsecond. And the accuracy of the timer wasn't equal this resolution in many real-world situations like heavy load on cpu. Sometimes the timer fires at 1 millisecond, sometimes 15 millisecond. You can see this problem in picture:
I searched a solution for days. But in the end i found Windows is a non real-time Operating System (RTOS) and don't give high resolution and precise timer.
I wrote my own High resolution precise timer with CPU polling for this goal. I developed a singleton class working in separate thread. It works at 10 micro second resolution.
But it is consuming one logical core in CPU. Equivalent to 6.25 percent at ryzen 2700.
For my application this CPU usage is unacceptable. How can i reduce this CPU usage without give high resolution away ?
This is the code that does the job:
void CsPreciseTimerThread::run()
{
while (true)
{
QMutexLocker locker(&mMutex);
for (int i=0;i<mTimerList.size();i++)
{
CsPreciseTimerMiddleLayer* timer = mTimerList[i];
int interval = timer->getInterval();
if ( (timer->isActive() == true&&timer->remainingTime()<0))
{
timer->emitTimeout();
timer->resetTime();
}
}
}
}
I tried to down priority of timer thread. I used this lines:
QThread::start(QThread::Priority::LowestPriority);
And this:
QThread::start(QThread::Priority::IdlePriority);
That changes makes timer less precise but CPU usage didn't decrease.
After that i tried force the current thread to sleep for few microseconds in loop.
QThread::usleep(15);
As you might guess sleep function did screw up the accuracy. Sometimes timer sleeps longer than expected , like 10 ms or 15 ms.
I'm going to reference Windows APIs directly instead of the Qt abstractions.
I don't think you want to lower your thread priority, I think you want to raise your thread priority and use the smallest amount of Sleep between polling to balance between latency and CPU overhead.
Two ideas:
In Windows Vista, they introduced the Multimedia Class Scheduler Service specifically so that they could move the Windows audio components out of kernel mode and running in user mode, without impacting pro-audio tools. That's probably going to be helpful to you - it's not precisesly "real time" guararteed, but it's meant for low latency operations.
Going the classic way - raise your process and thread priority to high or critical, while using a reasonable sleep statement of a few milliseconds. That is, raise your thread priority to THREAD_PRIORITY_TIME_CRITICAL. Then do a very small Sleep after completion of the for loop. This sleep amount should be between 0..10 milliseconds. Some experimentation required, but I would sleep no more than half the time to the next expected timeout, with a max of 10ms. And when you are within N microseconds of your timer, you might need to just spin instead of yielding. Some experimentation is required. You can also experiment with raising your Process priority to REALTIME_PRIORITY_CLASS.
Be careful - A handful of runaway processes and threads at these higher priority levels that isn't sleeping can lock up the system.

Effective frame limiting

I have a simulation that I am trying to convert to "real time". I say "real time" because its okay for performance to dip if needed (slowing down time for the observers/clients too). However, if there is a small number of objects, I want to limit the performance so that it runs at a steady frame rate (~100 FPS in this case).
I tried sleep() and Sleep() for linux and windows respectively but it doesn't seem to be accurate enough as the FPS really dips to a fraction of what I was aiming for. I suppose this scenario is common for games, especially online games but I was not able to find any helpful material on the subject. What is the preferable way of frame limiting? Is there a sleep method that can guarantee that it won't give up more time than what was specified?
Note: I'm running this on 2 different clusters (linux and windows) and all nodes only have built-in video. So I have to implement limiting on both platforms and it shouldn't be video card based (if there is even such a thing). I also need to implement the limiting on just one thread/node because there is already synchronization between nodes and the others would automatically be limited if one thread is properly limited.
Edit: some pseudo code that shows how I implemented the current limiter:
while (ProcessControlMessages())
{
uint64 tStart;
SimulateFrame();
uint64 newT =_context.GetTimeMs64();
if (newT - tStart < DESIRED_FRAME_RATE_DURATION)
this_thread::sleep_for(chrono::milliseconds(DESIRED_FRAME_RATE_DURATION - (newT - tStart)));
}
I was also thinking if I could do the limiting every N frames, where N is a fraction of the desired frame rate. I'll give it a try and report back.
For games a frame limiter is usually inadequate. Instead, the methods that update the game state (in your case SimulateFrame()) are kept frame rate independent. E.g. if you want to move an object, then the actual offset is the object's speed multiplied with the last frame's duration. Similarly, you can do this for all kind of calculations.
This approach has the advantage that the user gets maximum frame rate while maintaining the real-timeness. However, you should watch out that the frame durations don't get too small ( < 1 ms). This could result in inaccurate calculations. In this case a small sleep with a fixed duration could help.
This is how games usually handle this problem. You have to check if your simulation is appropriate for this technique, too.
Instead of having each frame try to sleep for long enough to be a full frame, have them sleep to try to average out. Keep a global/thread owned time count. for each frame have a "desired earliest end time," calculated from the previous desired earliest end time, rather than from the current time
tGoalEndTime = _context.GetTimeMS64() + DESIRED_FRAME_RATE_DURATION;
while (ProcessControlMessages())
{
SimulateFrame();
uint64 end =_context.GetTimeMs64();
if (end < tGoalEndTime) {
this_thread::sleep_for(chrono::milliseconds(tGoalEndTime - end)));
tGoalEndTime += DESIRED_FRAME_RATE_DURATION;
} else {
tGoalEndTime = end; // we ran over, pretend we didn't and keep going
}
Note: this uses your example's sleep_for because I wanted to show the minimum number of changes to enact it. sleep_until works better here.
The trick is that any frame that sleeps too long immediately causes the next few frames to rush to catch up.
Note: You cannot get any timing within 2ms (20% jitter on 100fps) on modern consumer OSs. The quantum for threads on most consumer OSs is around 100ms, so the instant you sleep, you may sleep for multiple quantums before it is your turn. sleep_until may use a OS specific technique to have less jitter, but you can't rely on it.

C++ how to check time every 10 seconds?

I'm writing a check point. I'm checking every time I run a loop. I think this will waste a lot of CPU time. I wonder how to check with the system time every 10 seconds?
time_t start = clock();
while(forever)
{
if(difftime(clock(),start)/CLOCKS_PER_SEC >=timeLimit)
{
break;
}
}
The very short answer is that this is very difficult, if you're a novice programmer.
Now a few possiblilites:
Sleep for ten seconds. That means your program is basically pointless.
Use alarm() and signal handlers. This is difficult to get right, because you mustn't do anything fancy inside the signal handler.
Use a timerfd and integrate timing logic into your I/O loop.
Set up a dedicated thread for the timer (which can then sleep); this is exceedingly difficult because you need to think about synchronising all shared data access.
The point to take home here is that your problem doesn't have a simple solution. You need to integrate the timing logic deeply into your already existing program flow. This flow should be some sort of "main loop" (e.g. an I/O multiplexing loop like epoll_wait or select), possibly multi-threaded, and that loop should pick up the fact that the timer has fired.
It's not that easy.
Here's a tangent, possibly instructive. There are basically two kinds of computer program (apart from all the other kinds):
One kind is programs that perform one specific task, as efficiently as possible, and are then done. This is for example something like "generate an SSL key pair", or "find all lines in a file that match X". Those are the sort of programs that are easy to write and understand as far as the program flow is concerned.
The other kind is programs that interact with the user. Those programs stay up indefinitely and respond to user input. (Basically any kind of UI or game, but also a web server.) From a control flow perspective, these programs spend the vast majority of their time doing... nothing. They're just idle waiting for user input. So when you think about how to program this, how do you make a program do nothing? This is the heart of the "main loop": It's a loop that tells the OS to keep the process asleep until something interesting happens, then processes the interesting event, and then goes back to sleep.
It isn't until you understand how to do nothing that you'll be able to design programs of the second kind.
If you need precision, you can place a call to select() with null parameters but with a delay. This is accurate to the millisecond.
struct timeval timeout= {10, 0};
select(1,NULL,NULL,NULL, &timeout);
If you don't, just use sleep():
sleep(10);
Just add a call to sleep to yield CPU time to the system:
time_t start = clock();
while(forever)
{
if(difftime(clock(),start)/CLOCKS_PER_SEC >=timeLimit)
{
break;
}
sleep(1); // <<< put process to sleep for 1s
}
You can use a event loop in your program and schedule timer to do a callback. For example you can use libev to make an event loop and add timer.
ev_timer_init (timer, callback, 0., 5.);
ev_timer_again (loop, timer);
...
timer->again = 17.;
ev_timer_again (loop, timer);
...
timer->again = 10.;
ev_timer_again (loop, timer);
If you code in a specific toolkit you can use other event loops, gtk, qt, glib has own event loops so you can use them.
The simplest approach (in a single threaded environment), would be to sleep for some time and repeatedly check if the total waiting time has expired.
int sleepPeriodMs = 500;
time_t start = clock();
while(forever)
{
while(difftime(clock(),start)/CLOCKS_PER_SEC <timeLimit) // NOTE: Change of logic here!
{
sleep(sleepPeriod);
}
}
Please note, that sleep() is not very accurate. If you need higher accuracy timing (i.e. better than 10ms of resolution) you might need to dig deeper. Also, with C++11 there is the <chrono> header that offers a lot more of functionality.
using namespace std::chrono;
int sleepPeriodMs = 500;
time_t start = clock();
while(forever)
{
auto start = system_clock()::now()
// do some stuff that takes between [0..10[ seconds
std::this_thread::sleep_until(start+seconds(10));
}

What is the best way to exit out of a loop after an elapsed time of 30ms in C++

What is the best way to exit out of a loop as close to 30ms as possible in C++. Polling boost:microsec_clock ? Polling QTime ? Something else?
Something like:
A = now;
for (blah; blah; blah) {
Blah();
if (now - A > 30000)
break;
}
It should work on Linux, OS X, and Windows.
The calculations in the loop are for updating a simulation. Every 30ms, I'd like to update the viewport.
The calculations in the loop are for
updating a simulation. Every 30ms, I'd
like to update the viewport.
Have you considered using threads? What you describe seems the perfect example of why you should use threads instead of timers.
The main process thread keeps taking care of the UI, and have a QTimer set to 30ms to update it. It locks a QMutex to have access to the data, performs the update, and releases the mutex.
The second thread (see QThread) does the simulation. For each cycle, it locks the QMutex, does the calculations and releases the mutex when the data is in a stable state (suitable for the UI update).
With the increasing trend on multi-core processors, you should think more and more on using threads than on using timers. Your applications automatically benefits from the increased power (multiple cores) of new processors.
While this does not answer the question, it might give another look at the solution. What about placing the simulation code and user interface in different threads? If you use Qt, periodic update can be realized using a timer or even QThread::msleep(). You can adapt the threaded Mandelbrot example to suit your need.
The code snippet example in this link pretty much does what you want:
http://www.cplusplus.com/reference/clibrary/ctime/clock/
Adapted from their example:
void runwait ( int seconds )
{
clock_t endwait;
endwait = clock () + seconds * CLOCKS_PER_SEC ;
while (clock() < endwait)
{
/* Do stuff while waiting */
}
}
If you need to do work until a certain time has elapsed, then docflabby's answer is spot-on. However, if you just need to wait, doing nothing, until a specified time has elapsed, then you should use usleep()
Short answer is: you can't in general, but you can if you are running on the right OS or on the right hardware.
You can get CLOSE to 30ms on all the OS's using an assembly call on Intel systems and something else on other architectures. I'll dig up the reference and edit the answer to include the code when I find it.
The problem is the time-slicing algorithm and how close to the end of your time slice you are on a multi-tasking OS.
On some real-time OS's, there's a system call in a system library you can make, but I'm not sure what that call would be.
edit: LOL! Someone already posted a similiar snippet on SO: Timer function to provide time in nano seconds using C++
VonC has got the comment with the CPU timer assembly code in it.
According to your question, every 30ms you'd like to update the viewport. I wrote a similar app once that probed hardware every 500ms for similar stuff. While this doesn't directly answer your question, I have the following followups:
Are you sure that Blah(), for updating the viewport, can execute in less than 30ms in every instance?
Seems more like running Blah() would be done better by a timer callback.
It's very hard to find a library timer object that will push on a 30ms interval to do updates in a graphical framework. On Windows XP I found that the standard Win32 API timer that pushes window messages upon timer interval expiration, even on a 2GHz P4, couldn't do updates any faster than a 300ms interval, no matter how low I set the timing interval to on the timer. While there were high performance timers available in the Win32 API, they have many restrictions, namely, that you can't do any IPC (like update UI widgets) in a loop like the one you cited above.
Basically, the upshot is you have to plan very carefully how you want to have updates occur. You may need to use threads, and look at how you want to update the viewport.
Just some things to think about. They caught me by surprise when I worked on my project. If you've thought these things through already, please disregard my answer :0).
You might consider just updating the viewport every N simulation steps rather than every K milliseconds. If this is (say) a serious commercial app, then you're probably going to want to go the multi-thread route suggested elsewhere, but if (say) it's for personal or limited-audience use and what you're really interested in is the details of whatever it is you're simulating, then every-N-steps is simple, portable and may well be good enough to be getting on with.
See QueryPerformanceCounter and QueryPerformanceFrequency
If you are using Qt, here is a simple way to do this:
QTimer* t = new QTimer( parent ) ;
t->setInterval( 30 ) ; // in msec
t->setSingleShot( false ) ;
connect( t, SIGNAL( timeout() ), viewPort, SLOT( redraw() ) ) ;
You'll need to specify viewPort and redraw(). Then start the timer with t->start().