Uniformly Regulating Program Execution Rate [Windows C++] - c++

First off, I found a lot of information on this topic, but no solutions that solved the issue unfortunately.
I'm simply trying to regulate my C++ program to run at 60 iterations per second. I've tried everything from GetClockTicks() to GetLocalTime() to help in the regulation but every single time I run the program on my Windows Server 2008 machine, it runs slower than on my local machine and I have no clue why!
I understand that "clock" based function calls return CPU time spend on the execution so I went to GetLocalTime and then tried to differentiate between the start time and the stop time then call Sleep((FPS / 1000) - millisecondExecutionTime)
My local machine is quite faster than the servers CPU so obviously the thought was that it was going off of CPU ticks, but that doesn't explain why the GetLocalTime doesn't work. I've been basing this method off of http://www.lazyfoo.net/SDL_tutorials/lesson14/index.php changing the get_ticks() with all of the time returning functions I could find on the web.
For example take this code:
#include <Windows.h>
#include <time.h>
#include <string>
#include <iostream>
using namespace std;
int main() {
int tFps = 60;
int counter = 0;
SYSTEMTIME gStart, gEnd, start_time, end_time;
GetLocalTime( &gStart );
bool done = false;
while(!done) {
GetLocalTime( &start_time );
Sleep(10);
counter++;
GetLocalTime( &end_time );
int startTimeMilli = (start_time.wSecond * 1000 + start_time.wMilliseconds);
int endTimeMilli = (end_time.wSecond * 1000 + end_time.wMilliseconds);
int time_to_sleep = (1000 / tFps) - (endTimeMilli - startTimeMilli);
if (counter > 240)
done = true;
if (time_to_sleep > 0)
Sleep(time_to_sleep);
}
GetLocalTime( &gEnd );
cout << "Total Time: " << (gEnd.wSecond*1000 + gEnd.wMilliseconds) - (gStart.wSecond*1000 + gStart.wMilliseconds) << endl;
cin.get();
}
For this code snippet, run on my computer (3.06 GHz) I get a total time (ms) of 3856 whereas on my server (2.53 GHz) I get 6256. So it potentially could be the speed of the processor though the ratio of 2.53/3.06 is only .826797386 versus 3856/6271 is .614893956.
I can't tell if the Sleep function is doing something drastically different than expected though I don't see why it would, or if it is my method for getting the time (even though it should be in world time (ms) not clock cycle time. Any help would be greatly appreciated, thanks.

For one thing, Sleep's default resolution is the computer's quota length - usually either 10ms or 15ms, depending on the Windows edition. To get a resolution of, say, 1ms, you have to issue a timeBeginPeriod(1), which reprograms the timer hardware to fire (roughly) once every millisecond.

In your main loop you can
int main()
{
// Timers
LONGLONG curTime = NULL;
LONGLONG nextTime = NULL;
Timers::GameClock::GetInstance()->GetTime(&nextTime);
while (true) {
Timers::GameClock::GetInstance()->GetTime(&curTime);
if ( curTime > nextTime && loops <= MAX_FRAMESKIP ) {
nextTime += Timers::GameClock::GetInstance()->timeCount;
// Business logic goes here and occurr based on the specified framerate
}
}
}
using this time library
include "stdafx.h"
LONGLONG cacheTime;
Timers::SWGameClock* Timers::SWGameClock::pInstance = NULL;
Timers::SWGameClock* Timers::SWGameClock::GetInstance ( ) {
if (pInstance == NULL) {
pInstance = new SWGameClock();
}
return pInstance;
}
Timers::SWGameClock::SWGameClock(void) {
this->Initialize ( );
}
void Timers::SWGameClock::GetTime ( LONGLONG * t ) {
// Use timeGetTime() if queryperformancecounter is not supported
if (!QueryPerformanceCounter( (LARGE_INTEGER *) t)) {
*t = timeGetTime();
}
cacheTime = *t;
}
LONGLONG Timers::SWGameClock::GetTimeElapsed ( void ) {
LONGLONG t;
// Use timeGetTime() if queryperformancecounter is not supported
if (!QueryPerformanceCounter( (LARGE_INTEGER *) &t )) {
t = timeGetTime();
}
return (t - cacheTime);
}
void Timers::SWGameClock::Initialize ( void ) {
if ( !QueryPerformanceFrequency((LARGE_INTEGER *) &this->frequency) ) {
this->frequency = 1000; // 1000ms to one second
}
this->timeCount = DWORD(this->frequency / TICKS_PER_SECOND);
}
Timers::SWGameClock::~SWGameClock(void)
{
}
with a header file that contains the following:
// Required for rendering stuff on time
#pragma once
#define TICKS_PER_SECOND 60
#define MAX_FRAMESKIP 5
namespace Timers {
class SWGameClock
{
public:
static SWGameClock* GetInstance();
void Initialize ( void );
DWORD timeCount;
void GetTime ( LONGLONG* t );
LONGLONG GetTimeElapsed ( void );
LONGLONG frequency;
~SWGameClock(void);
protected:
SWGameClock(void);
private:
static SWGameClock* pInstance;
}; // SWGameClock
} // Timers
This will ensure that your code runs at 60FPS (or whatever you put in) though you can probably dump the MAX_FRAMESKIP as that's not truly implemented in this example!

You could try a WinMain function and use the SetTimer function and a regular message loop (you can also take advantage of the filter mechanism of GetMessage( ... ) ) in which you test for the WM_TIMER message with the requested time and when your counter reaches the limit do a PostQuitMessage(0) to terminate the message loop.

For a duty cycle that fast, you can use a high accuracy timer (like QueryPerformanceTimer) and a busy-wait loop.
If you had a much lower duty cycle, but still wanted precision, then you could Sleep for part of the time and then eat up the leftover time with a busy-wait loop.
Another option is to use something like DirectX to sync yourself to the VSync interrupt (which is almost always 60 Hz). This can make a lot of sense if you're coding a game or a/v presentation.
Windows is not a real-time OS, so there will never be a perfect way to do something like this, as there's no guarantee your thread will be scheduled to run exactly when you need it to.
Note that in the remarks for Sleep, the actual amount of time will be at least one "tick" and possible one whole "tick" longer than the delay you requested before the thread is scheduled to run again (and then we have to assume the thread is scheduled). The "tick" can vary a lot depending on hardware and the version of Windows. It is commonly in the 10-15 ms range, and I've seen it as bad as 19 ms. For 60 Hz, you need 16.666 ms per iteration, so this is obviously not nearly precise enough to give you what you need.

What about rendering (iterating) based on the time elapsed between rendering of each frame? Consider creating a void render(double timePassed) function and render depending on the timePassed parameter instead of putting program to sleep.
Imagine, for example, you want to render a ball falling or bouncing. You would know it's speed, acceleration and all other physics that you need. Calculate the position of the ball based on timePassed and all other physics parameters (speed, acceleration, etc.).
Or if you prefer, you could just skip the render() function execution if time passed is a value to small, instead of puttin program to sleep.

Related

pthread_cond_timedwait timing out late when large load put on CPU

In writing unit tests for an object, I am noticing that a pthread_cond_timedwait does not timeout soon enough when large loads are put upon the CPU. If these loads are not put on the CPU, everything works fine. When loads are put on to the system, however, I find that no matter the amount of time I set the timeout to, the true delay is off by about 50-100ms.
For example, here is a printout from a single interval of the program, where the last and current times are found using the function GetTimeInMs.
// Printout, values are in ms
Last: 89799240
Current: 89799440
Period Length: 200
Expected Period: 100
From all I have read this issue is usually caused by using relative times instead of absolute times, but as far as I can tell we are using absolute times correctly. If you wonderful people could help me figure out what is being done wrong here I would be very grateful.
The function utilizing timedwait is shown here. Note that based off of timing debugging I have done, I know the extra time generated is done via the timedwait call, so I have not included other code that would not be necessary.
bool func(unsigned long long int time = 100) // ms
{
struct timespec ts;
pthread_mutex_lock(&m_Mutex);
if (0 == m_CurrentCount)
{
// Current time + delay in ns
unsigned long long int absnanotime = (GetTimeInMs()+time)*1000000;
struct timespec ts;
ts.tv_nsec = absnanotime % 1000000000ULL;
ts.tv_sec = absnanotime / 1000000000ULL;
do
{
if (0 != pthread_cond_timedwait(&m_Condition, &m_Mutex, &ts))
{
// In the case I am testing, I hope to get here via timeout in 100 ms
pthread_mutex_unlock(&m_Mutex);
return false;
}
}
while (!m_CurrentCount);
}
pthread_mutex_unlock(&m_Mutex);
return true;
}
unsigned long long int GetTimeInMs()
{
unsigned long long int time;
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
time = ts.tv_nsec + ts.tv_sec * 1000000000ULL;
time = time / 1000000ULL; // Converts to ms
return time;
}
The code used to initialize the class variables used in func.
void init()
{
pthread_mutex_init(&m_Mutex, NULL);
pthread_condattr_init(&m_Attr);
pthread_condattr_setclock(&m_Attr, CLOCK_MONOTONIC);
pthread_cond_init(&m_Condition, &m_Attr);
}
The CPU eater thread which simulates CPU load is running the following while loop.
void cpuEatingThread()
{
while (false == m_ShutdownRequested);
{
// m_UselessFoo is of type float*
m_UselessFoo = new float(1.23423525);
delete m_UselessFoo;
}
}
It's likely that, when the wait times out, the thread becomes ready without any priority boost or any other such action/s. If the box is loaded up, then the ready thread may not become running immediately.
It's common to apply temporary priority boosts to thread that become ready on signals - this tends to improve overall performance in the 'usual' case where the signal arrives before the timeout. The timeout is often more of an 'unusual' event, often signaling some sort of failure that will not be repeated and so threads becoming ready on timeout can wait their turn:)
For timed waits in general, the requirement is that they will wait at least as long as their argument. If you want precise times, this is not the right tool; you'll need something that guarantees particular times, and that's generally only available in a real-time operating system (RTOS).

Is this a good way to lock a loop on 60 loops per second?

I have a game with Bullet Physics as the physics engine, the game is online multiplayer so I though to try the Source Engine approach to deal with physics sync over the net. So in the client I use GLFW so the fps limit is working there by default. (At least I think it's because GLFW). But in the server side there is no graphics libraries so I need to "lock" the loop which simulating the world and stepping the physics engine to 60 "ticks" per second.
Is this the right way to lock a loop to run 60 times a second? (A.K.A 60 "fps").
void World::Run()
{
m_IsRunning = true;
long limit = (1 / 60.0f) * 1000;
long previous = milliseconds_now();
while (m_IsRunning)
{
long start = milliseconds_now();
long deltaTime = start - previous;
previous = start;
std::cout << m_Objects[0]->GetObjectState().position[1] << std::endl;
m_DynamicsWorld->stepSimulation(1 / 60.0f, 10);
long end = milliseconds_now();
long dt = end - start;
if (dt < limit)
{
std::this_thread::sleep_for(std::chrono::milliseconds(limit - dt));
}
}
}
Is it ok to use std::thread for this task?
Is this way is efficient enough?
Will the physics simulation will be steped 60 times a second?
P.S
The milliseconds_now() looks like this:
long long milliseconds_now()
{
static LARGE_INTEGER s_frequency;
static BOOL s_use_qpc = QueryPerformanceFrequency(&s_frequency);
if (s_use_qpc) {
LARGE_INTEGER now;
QueryPerformanceCounter(&now);
return (1000LL * now.QuadPart) / s_frequency.QuadPart;
}
else {
return GetTickCount();
}
}
Taken from: https://gamedev.stackexchange.com/questions/26759/best-way-to-get-elapsed-time-in-miliseconds-in-windows
If you want to limit the rendering to a maximum FPS of 60, it is very simple :
Each frame, just check if the game is running too fast, if so just wait, for example:
while ( timeLimitedLoop )
{
float framedelta = ( timeNow - timeLast )
timeLast = timeNow;
for each ( ObjectOrCalculation myObjectOrCalculation in allItemsToProcess )
{
myObjectOrCalculation->processThisIn60thOfSecond(framedelta);
}
render(); // if display needed
}
Please note that if vertical sync is enabled, rendering will already be limited to the frequency of your vertical refresh, perhaps 50 or 60 Hz).
If, however, you wish the logic locked at 60fps, that's different matter: you will have to segregate your display and logic code in such a way that the logic runs at a maximum of 60 fps, and modify the code so that you can have a fixed time-interval loop and a variable time-interval loop (as above). Good sources to look at are "fixed timestep" and "variable timestep" ( Link 1 Link 2 and the old trusty Google search).
Note on your code:
Because you are using a sleep for the whole duration of the 1/60th of a second - already elapsed time you can miss the correct timing easily, change the sleep to a loop running as follows:
instead of
if (dt < limit)
{
std::this_thread::sleep_for(std::chrono::milliseconds(limit - dt));
}
change to
while(dt < limit)
{
std::this_thread::sleep_for(std::chrono::milliseconds(limit - (dt/10.0)));
// or 100.0 or whatever fine-grained step you desire
}
Hope this helps, however let me know if you need more info:)

Performance measurement: time vs tick?

What is the best way to ensure that real-time performance are achieved, with a 2 thread program running on 1 or 2 cores ? boost::timer or RDTSC ?
We started from that code
boost::timer t;
p.f(frame);
max_time_per_frame = std!::max(max_time_per_frame, t.ellapsed());
... where p is an instance of Proc.
class Proc {
public:
Proc() : _frame_counter(0) {}
// that function must be call for each video frame and take less than 1/fps seconds
// 24 fps => 1/24 => < 0.04 seconds.
void f(unsigned char * const frame)
{
processFrame(frame); //that's the most important part
//that part run every 240 frame and should not affect
// the processFrame flow !
if(_frame_counter % 240 == 0)
{
do_something_more();
}
_frame_counter++;
}
private:
_frame_counter;
}
So it run in a Single-Thread/Single-Core way and we observed that the max_time_per_frame is higher than the target time because of the do_something_more processing.
To remove those processing time spikes, we started every do_something_more in a separate thread, like in the pseudo-code below.
class Proc {
public:
Proc() : _frame_counter(0) {
t = start_thread ( do_something_more_thread );
}
// that function must be call for each video frame and take less than 1/fps seconds
// 24 fps => 1/24 => < 0.04 seconds.
void f(unsigned char * const frame)
{
processFrame(frame); //that's the most important part
//that part run every 240 frame and should not affect
// the processFrame flow !
if(_frame_counter % 240 == 0)
{
sem.up();
}
_frame_counter++;
}
void do_something_more_thread()
{
while(1)
{
sem.down();
do_something_more();
}
}
private:
_frame_counter;
semaphore sem;
thread t;
}
I always start my program on 1 and 2 core. So i use start /AFFINITY 1 pro.exe or start /AFFINITY 3 prog.exe
And from time point of view, every thing is ok, max_time_per_frame stay below our target, close to the average at 0.02 second/frame.
But if I dump the number of tick spent in f, using RDTSC.
#include <intrin.h>
...
unsigned long long getTick()
{
return __rdtsc();
}
void f(unsigned char * const frame)
{
s = getTick();
processFrame(frame); //that's the most important part
//that part run every 240 frame and should not affect
// the processFrame flow !
if(_frame_counter % 240 == 0)
{
sem.up();
}
_frame_counter++;
e = getTick();
dump(e - s);
}
start /AFFINITY 3 prog.exe the max_tick_per_frame was stable and as expected i saw 1 thread(100% of 1 core) and the 2nd thread started at a normal pace on the 2nd core.
start /AFFINITY 1 pro.exe, i saw only 1 core at 100% (as expected), but the do_something_more computation time doesn't seem spead over the time, interleaved thread execution. In fact, at regular interval, i saw a huge spike of the tick count.
So the question is why ? does the only interesting measure is time ? does tickhave sense when running sofware on 1 core (frequency boost) ?
Although you'll never get true real time performance out of windows, you can reduce the pitfalls of RDTSC by using the Windows API.
Here is a small code chunk that takes advantage of the API.
#include <Windows.h>
#include <stdio.h>
int
main(int argc, char* argv[])
{
double timeTaken;
LARGE_INTEGER frequency;
LARGE_INTEGER firstCount;
LARGE_INTEGER endCount;
/*-- give us the higheest priority avaliable --*/
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);
/*-- get the frequency of the timer we are using --*/
QueryPerformanceFrequency(&frequency);
/*-- get the timers current tick --*/
QueryPerformanceCounter(&firstCount);
/*-- some pause --*/
Sleep(1);
/*-- get the timers current tick --*/
QueryPerformanceCounter(&endCount);
/*-- calculate time passed --*/
timeTaken = (double)(doubleendCount.QuadPart-firstCount.QuadPart)/(double)(frequency.QuadPart/1000);
printf("Time: %lf", timeTaken);
return 0;
}
You can also use:
#include <Mmsystem.h>
if(timeBeginPeriod(1) == TIMERR_NOCANDO) {
printf("TIMER could not be set to 1ms\n");
}
/*-- your code here --*/
timeEndPeriod(1);
But this will change the global windows timer resolution to what ever interval you set it to (or at least attempt it), so i wouldn't recommend this approach unless you are 100% certain you are the only one that will use this program as this may have unintended side effects on other programs.
Based on the comment about the REALTIME_PRIORITY_CLASS, I added the following line in a test program.
#define NOMINMAX
#include <windows.h>
....
SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
And now the tick count i got from RDTSC looks better, the huge spike I saw before on 1 frame, is now spread over multiple frames.
As i wanted to keep my code portable and create some scheduling opportunities, I yielded the additional thread at some specific point using:
boots::this_thread::yield();
and with that change, I obtain the scheduling and the RDTSC value I expected without having to configure the priority!
Thank for all help and advice.

C++ Setting Speed of While Loop per Second

I am relatively new to C++, so I don't have a huge amount of experience. I have learned Python, and I am trying to make an improved version of a Python code I wrote in C++. However, I want it to work in real time, so I need to set the speed of a While loop. I'm sure there is an answer, but I couldn't find it. I want a comparable code to this:
rate(timeModifier * (1/dt))
This was the code I used in Python. I can set a variable dt to make calculations more precise, and timeModifier to double or triple the speed (1 sets it to realtime). This means that the program will go through the loop 1/dt times per second. I understand I can include time.h at the header, but I guess I am too new to C++ to understand how to transfer this to my needs.
You could write your own timer class:
#include <ctime>
class Timer {
private:
unsigned long startTime;
public:
void start() {
startTime = clock();
}
unsigned long elapsedTime() {
return ((unsigned long) clock() - startTime) / CLOCKS_PER_SEC;
}
bool isTimeout(unsigned long seconds) {
return seconds >= elapsedTime();
}
};
int main()
{
unsigned long dt = 10; //in seconds
Timer t;
t.start();
while(true)
{
if(t.elapsedTime() < dt)
{
//do something to pass time as a busy-wait or sleep
}
else
{
//do something else
t = Timer(); //reset the timer
}
}
}
Note that busy-waits are discouraged, since they will hog the CPU. If you don't need to do anything, use the sleep command(Windows) or usleep ( Linux). For more information on making timers in C++, see this link.
You can't do it the same manner in C++. You need to manually call some kind of sleep function in calculation loop, Sleep on Windows or usleep on *NIX.
It's been a while since I've done something like this, but something like this will work:
#include <time.h>
time_t t2, t1 = time(NULL);
while(CONDITIONS)
{
time_t t2 = time(NULL);
if(difftime(t2, t1) > timeModifier)
{
//DO the stuff!
t1 = time(NULL);
}
}
I should note, however, that I'm not familiar with the precision of this method, I think it measures the difference in seconds.
If you need something more precise, use the clock() function which has the number of milliseconds since 12:00 AM beginning January 1, 1980, to the nearest 10 milliseconds.
Perhaps something like this:
#include <time.h>
clock_t t2, t1 = clock();
while(CONDITIONS)
{
t2 = clock();
if((t2-t1) > someTimeElapsed*timeModifier)
{
//DO the stuff!
t1 = clock());
}
}
Update:
You can even yield the CPU to other threads and processes by adding this after the end of the if statement:
else
{
usleep(10000); //sleep for ten milliseconds (chosen because of precision on clock())
}
Depending on the accuracy you need, and your platform, you could use usleep This allows you to set the pause time down to microseconds:
#include <unistd.h>
int usleep(useconds_t useconds);
Remember that your loop will always take longer than this because of the inherent processingtime of the rest of the loop but it's a start. For anything more accurate,you'd probably need to look at timer based callbacks.
You should really create a new thread and have it do the timing so that it remains unaffected by the processing work done in the loop.
WARNING: Pseudo code... just to give you an idea of how to start.
Thread* tThread = CreateTimerThread(1000);
tThread->run();
while( conditionNotMet() )
{
tThread->waitForTimer();
doWork();
}
CreateTimerThread() should return the thread object you want, and run would be something like:
run()
{
while( false == shutdownLatch() )
{
Sleep( timeout );
pulseTimerEvent();
}
}
waitForTimer()
{
WaitForSingleObject( m_handle );
return;
}
Under Windows you can use QueryPerformanceCounter, while polling the time (e.g. within another while loop) call Sleep(0) to allow other threads to continue operation.
Remember Sleep is highly inaccurate. For full control just run a loop without operations, however you'll use 100% of the CPU. To relax the strain on the CPU you can call Sleep(10) etc.

pthread sleep function, cpu consumption

On behalf, sorry for my far from perfect English.
I've recently wrote my self a demon for Linux (to be exact OpenWRT router) in C++ and i came to problem.
Well there are few threads there, one for each opened TCP connection, main thread waiting for new TCP connections and, as I call it, commander thread to check for status.
Every thing works fine, but my CPU is always at 100%. I now that its because of the commander code:
void *CommanderThread(void* arg)
{
Commander* commander = (Commander*)arg;
pthread_detach(pthread_self());
clock_t endwait;
while(true)
{
uint8_t temp;
endwait = clock () + (int)(1 * CLOCKS_PER_SEC);
for(int i=0;i<commander->GetCount();i++)
{
ptrRelayBoard rb = commander->GetBoard(i);
if (rb!= NULL)
rb->Get(0x01,&temp);
}
while (clock() < endwait);
}
return NULL;
}
As you can see the program do stuff every 1s. Time is not critical here. I know that CPU is always checking did the time passed. I've tried do do something like this:
while (clock() < endwait)
usleep(200);
But when the function usleep (and sleep also) seam to freeze the clock increment (its always a constant value after the usleep).
Is there any solution, ready functions (like phread_sleep(20ms)), or walk around for my problem? Maybe i should access the main clock somehow?
Here its not so critical i can pretty much check how long did the execution of status checking took (latch the clock() before, compare with after), and count the value to put as an argument to the usleep function. But in other thread, I would like to use this form.
Do usleep is putting whole process to freeze?
I'm currently debugging it on Cygwin, but don't think the problem lies here.
Thanks for any answers and suggestions its much appreciated.
J.L.
If it doesn't need to be exactly 1s, then just usleep a second. usleep and sleep put the current thread into an efficient wait state that is at least the amount of time you requested (and then it becomes eligible for being scheduled again).
If you aren't trying to get near exact time there's no need to check clock().
I've I have resolved it other way.
#include <sys/time.h>
#define CLOCK_US_IN_SECOND 1000000
static long myclock()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return (tv.tv_sec * CLOCK_US_IN_SECOND) + tv.tv_usec;
}
void *MainThread(void* arg)
{
Commander* commander = (Commander*)arg;
pthread_detach(pthread_self());
long endwait;
while(true)
{
uint8_t temp;
endwait = myclock() + (int)(1 * CLOCK_US_IN_SECOND);
for(int i=0;i<commander->GetCount();i++)
{
ptrRelayBoard rb = commander->GetBoard(i);
if (rb!= NULL)
rb->Get(0x01,&temp);
}
while (myclock() < endwait)
usleep((int)0.05*CLOCK_US_IN_SECOND);
}
return NULL;
}
Bare in mind, that this code is vulnerable for time change during execution. Don't have idea how to omit that, but in my case its not really important.