C++ How to make precise frame rate limit? - c++

I'm trying to create a game using C++ and I want to create limit for fps but I always get more or less fps than I want. When I look at games that have fps limit it's always precise framerate. Tried using Sleep() std::this_thread::sleep_for(sleep_until). For example Sleep(0.01-deltaTime) to get 100 fps but ended up with +-90fps.
How do these games handle fps so precisely when any sleeping isn't precise?
I know I can use infinite loop that just checks if time passed but it's using full power of CPU but I want to decrease CPU usage by this limit without VSync.

Yes, sleep is usually inaccurate. That is why you sleep for less than the actual time it takes to finish the frame. For example, if you need 5 more milliseconds to finish the frame, then sleep for 4 milliseconds. After the sleep, simply do a spin-lock for the rest of the frame. Something like
float TimeRemaining = NextFrameTime - GetCurrentTime();
Sleep(ConvertToMilliseconds(TimeRemaining) - 1);
while (GetCurrentTime() < NextFrameTime) {};
Edit: as stated in another answer, timeBeginPeriod() should be called to increase the accuracy of Sleep(). Also, from what I've read, Windows will automatically call timeEndPeriod() when your process exits if you don't before then.

You could record the time point when you start, add a fixed duration to it and sleep until the calculated time point occurs at the end (or beginning) of every loop. Example:
#include <chrono>
#include <iostream>
#include <ratio>
#include <thread>
template<std::intmax_t FPS>
class frame_rater {
public:
frame_rater() : // initialize the object keeping the pace
time_between_frames{1}, // std::ratio<1, FPS> seconds
tp{std::chrono::steady_clock::now()}
{}
void sleep() {
// add to time point
tp += time_between_frames;
// and sleep until that time point
std::this_thread::sleep_until(tp);
}
private:
// a duration with a length of 1/FPS seconds
std::chrono::duration<double, std::ratio<1, FPS>> time_between_frames;
// the time point we'll add to in every loop
std::chrono::time_point<std::chrono::steady_clock, decltype(time_between_frames)> tp;
};
// this should print ~10 times per second pretty accurately
int main() {
frame_rater<10> fr; // 10 FPS
while(true) {
std::cout << "Hello world\n";
fr.sleep(); // let it sleep any time remaining
}
}

The accepted answer sounds really bad. It would not be accurate and it would burn the CPU!
Thread.Sleep is not accurate because you have to tell it to be accurate (by default is about 15ms accurate - means that if you tell it to sleep 1ms it could sleep 15ms).
You can do this with Win32 API call to timeBeginPeriod & timeEndPeriod functions.
Check MSDN for more details -> https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod
(I would comment on the accepted answer but still not having 50 reputation)

Be very careful when implementing any wait that is based on scheduler sleep.
Most OS schedulers have higher latency turn-around for a wait with no well-defined interval or signal to bring the thread back into the ready-to-run state.
Sleeping isn't inaccurate per-se, you're just approaching the problem all wrong. If you have access to something like DXGI's Waitable Swapchain, you synchronize to the DWM's present queue and get really reliable low-latency timing.
You don't need to busy-wait to get accurate timing, a waitable timer will give you a sync object to reschedule your thread.
Whatever you do, do not use the currently accepted answer in production code. There's an edge case here you WANT TO AVOID, where Sleep (0) does not yield CPU time to higher priority threads. I've seen so many game devs try Sleep (0) and it's going to cause you major problems.

Use a timer.
Some OS's can provide special functions. For example, for Windows you can use SetTimer and handle its WM_TIMER messages.
Then calculate the frequency of the timer. 100 fps means that the timer must fire an event each 0.01 seconds.
At the event handler for this timer-event you can do your rendering.
In case the rendering is slower than the desired frequency then use a syncro flag OpenGL sync and discard the timer-event if the previous rendering is not complete.

You may set a const fps variable to your desired frame rate, then you can update your game if the elapsed time from last update is equal or more than 1 / desired_fps.
This will probably work.
Example:
const /*or constexpr*/ int fps{60};
// then at update loop.
while(running)
{
// update the game timer.
timer->update();
// check for any events.
if(timer->ElapsedTime() >= 1 / fps)
{
// do your updates and THEN renderer.
}
}

Related

I'm looking to improve or request my current delay / sleep method. c++

Currently I am coding a project that requires precise delay times over a number of computers. Currently this is the code I am using I found it on a forum. This is the code below.
{
LONGLONG timerResolution;
LONGLONG wantedTime;
LONGLONG currentTime;
QueryPerformanceFrequency((LARGE_INTEGER*)&timerResolution);
timerResolution /= 1000;
QueryPerformanceCounter((LARGE_INTEGER*)&currentTime);
wantedTime = currentTime / timerResolution + ms;
currentTime = 0;
while (currentTime < wantedTime)
{
QueryPerformanceCounter((LARGE_INTEGER*)&currentTime);
currentTime /= timerResolution;
}
}
Basically the issue I am having is this uses alot of CPU around 16-20% when I start to call on the function. The usual Sleep(); uses Zero CPU but it is extremely inaccurate from what I have read from multiple forums is that's the trade-off when you trade accuracy for CPU usage but I thought I better raise the question before I set for this sleep method.
The reason why it's using 15-20% CPU is likely because it's using 100% on one core as there is nothing in this to slow it down.
In general, this is a "hard" problem to solve as PCs (more specifically, the OSes running on those PCs) are in general not made for running real time applications. If that is absolutely desirable, you should look into real time kernels and OSes.
For this reason, the guarantee that is usually made around sleep times is that the system will sleep for atleast the specified amount of time.
If you are running Linux you could try using the nanosleep method (http://man7.org/linux/man-pages/man2/nanosleep.2.html) Though I don't have any experience with it.
Alternatively you could go with a hybrid approach where you use sleeps for long delays, but switch to polling when it's almost time:
#include <thread>
#include <chrono>
using namespace std::chrono_literals;
...
wantedtime = currentTime / timerResolution + ms;
currentTime = 0;
while(currentTime < wantedTime)
{
QueryPerformanceCounter((LARGE_INTEGER*)&currentTime);
currentTime /= timerResolution;
if(currentTime-wantedTime > 100) // if waiting for more than 100 ms
{
//Sleep for value significantly lower than the 100 ms, to ensure that we don't "oversleep"
std::this_thread::sleep_for(50ms);
}
}
Now this is a bit race condition prone, as it assumes that the OS will hand back control of the program within 50ms after the sleep_for is done. To further combat this you could turn it down (to say, sleep 1ms).
You can set the Windows timer resolution to minimum (usually 1 ms), to make Sleep() accurate up to 1 ms. By default it would be accurate up to about 15 ms. Sleep() documentation.
Note that your execution can be delayed if other programs are consuming CPU time, but this could also happen if you were waiting with a timer.
#include <timeapi.h>
// Sleep() takes 15 ms (or whatever the default is)
Sleep(1);
TIMECAPS caps_;
timeGetDevCaps(&caps_, sizeof(caps_));
timeBeginPeriod(caps_.wPeriodMin);
// Sleep() now takes 1 ms
Sleep(1);
timeEndPeriod(caps_.wPeriodMin);

Busy Loop/Spinning sometimes takes too long under Windows

I'm using a windows 7 PC to output voltages at a rate of 1kHz. At first I simply ended the thread with sleep_until(nextStartTime), however this has proven to be unreliable, sometimes working fine and sometimes being of by up to 10ms.
I found other answers here saying that a busy loop might be more accurate, however mine for some reason also sometimes takes too long.
while (true) {
doStuff(); //is quick enough
logDelays();
nextStartTime = chrono::high_resolution_clock::now() + chrono::milliseconds(1);
spinStart = chrono::high_resolution_clock::now();
while (chrono::duration_cast<chrono::microseconds>(nextStartTime -
chrono::high_resolution_clock::now()).count() > 200) {
spinCount++; //a volatile int
}
int spintime = chrono::duration_cast<chrono::microseconds>
(chrono::high_resolution_clock::now() - spinStart).count();
cout << "Spin Time micros :" << spintime << endl;
if (spinCount > 100000000) {
cout << "reset spincount" << endl;
spinCount = 0;
}
}
I was hoping that this would work to fix my issue, however it produces the output:
Spin Time micros :9999
Spin Time micros :9999
...
I've been stuck on this problem for the last 5 hours and I'd very thankful if somebody knows a solution.
According to the comments this code waits correctly:
auto start = std::chrono::high_resolution_clock::now();
const auto delay = std::chrono::milliseconds(1);
while (true) {
doStuff(); //is quick enough
logDelays();
auto spinStart = std::chrono::high_resolution_clock::now();
while (start > std::chrono::high_resolution_clock::now() + delay) {}
int spintime = std::chrono::duration_cast<std::chrono::microseconds>
(std::chrono::high_resolution_clock::now() - spinStart).count();
std::cout << "Spin Time micros :" << spintime << std::endl;
start += delay;
}
The important part is the busy-wait while (start > std::chrono::high_resolution_clock::now() + delay) {} and start += delay; which will in combination make sure that delay amount of time is waited, even when outside factors (windows update keeping the system busy) disturb it. In case that the loop takes longer than delay the loop will be executed without waiting until it catches up (which may be never if doStuff is sufficiently slow).
Note that missing an update (due to the system being busy) and then sending 2 at once to catch up might not be the best way to handle the situation. You may want to check the current time inside doStuff and abort/restart the transmission if the timing is wrong by more then some acceptable amount.
On Windows I dont think its possible to ever get such precise timing, because you can not garuntee your thread is actually running at the time you desire. Even with low CPU usage and setting your thread to real time priority, it can still be interuptted (Hardware interupts as I understand. Never fully investigate but even a simple while(true) ++i; type loop at realtime Ive seen get interupted then moved between CPU cores). While such interrupts and switching for a realtime thread is very quick, its still significant if your trying to directly drive a signal without buffering.
Instead you really want to read and write buffers of digital samples (so at 1KHz each sample is 1ms). You need to be sure to queue another buffer before the last one is completed, which will constrain how small they can be, but at 1KHz at realtime priority if the code is simple and no other CPU contention a single sample buffer (1ms) might even be possible, which is at worst 1ms extra latency over "immediate" but you would have to test. You then leave it up to the hardware and its drivers to handle the precise timing (e.g. make sure each output sample is "exactly" 1ms to the accuracy the vendor claims).
This basically means your code only has to be accurate to 1ms in worst case, rather than trying to persue somthing far smaller than the OS really supports such as microsecond accuracy.
As long as you are able to queue a new buffer before the hardware used up the previous buffer, it will be able to run at the desired frequency without issue (to use audio as an example again, while the tolerated latencies are often much higher and thus the buffers as well, if you overload the CPU you can still sometimes hear auidble glitches where an application didnt queue up new raw audio in time).
With careful timing you might even be able to get down to a fraction of a millisecond by waiting to process and queue your next sample as long as possible (e.g. if you need to reduce latency between input and output), but remember that the closer you cut it the more you risk submitting it too late.

Can i retrieve microseconds or very accurate milliseconds on c++ on windows?

So I made a game loop that uses SDL_Delay function to cap the frames per second, it look like this:
//While the user hasn't qui
while( stateID != STATE_EXIT )
{
//Start the frame timer
fps.start();
//Do state event handling
currentState->handle_events();
//Do state logic
currentState->logic();
//Change state if needed
change_state();
//Do state rendering
currentState->render();
//Update the screen
if( SDL_Flip( screen ) == -1 )
{
return 1;
}
//Cap the frame rate
if( fps.get_ticks() < 1000 / FRAMES_PER_SECOND )
{
SDL_Delay( ( 1000 / FRAMES_PER_SECOND ) - fps.get_ticks() );
}
}
So when I run my games on 60 frames per second (which is the "eye cap" I assume) I can still see laggy type of motion, meaning i see the frames appearing independently causing unsmooth motion.
This is because apparently SDL_Delay function is not too accurate, causing +,- 15 milliseconds or something difference between frames greater than whatever I want it to be.
(all these are just my assumptions)
so I am just searching fo a good and accurate timer that will help me with this problem.
any suggestions?
I think there is a similar question in How to make thread sleep less than a millisecond on Windows
But as a game programmer myself, I don't rely on sleep functions to manage frame-rate (the parameter they take is just a minimum). I just draw stuff on screen as fast as I can. I have a bunch of function calls in my game loop, and then I keep track of how often I'm calling them. For instance, I check input quite often (1000x/second) to make the game more responsive, but I don't check the network inbox more than 100x/second.
For example:
#define NW_CHECK_INTERVAL 10
#define INPUT_CHECK_INTERVAL 1
uint32_t last_nw_check = 0, last_input_check = 0;
while (game_running) {
uint32_t now = SDL_GetTicks();
if (now - last_nw_check > NW_CHECK_INTERVAL) {
check_network();
last_nw_check = now;
}
if (now - last_input_check > INPUT_CHECK_INTERVAL) {
check_input();
last_input_check = now;
}
check_video();
// and so on...
}
Use the QueryPerformanceCounter / Frequency for that.
LARGE_INTEGER start, end, tps; //tps = ticks per second
QueryPerformanceFrequency( &tps );
QueryPerformanceCounter( &start );
QueryPerformanceCounter( &end );
int usPassed = (end.QuadPart - start.QuadPart) * 1000000 / tps.QuadPart;
Here's a small wait function I had created for timing midi sequences using QueryPerformanceCounter:
void wait(int waitTime) {
LARGE_INTEGER time1, time2, freq;
if(waitTime == 0)
return;
QueryPerformanceCounter(&time1);
QueryPerformanceFrequency(&freq);
do {
QueryPerformanceCounter(&time2);
} while((time2.QuadPart - time1.QuadPart) * 1000000ll / freq.QuadPart < waitTime);
}
To convert ticks to microseconds, calculate the difference in ticks, multiply by 1,000,000 (microseconds/second) and divide by the frequency of ticks per second.
Note that some things may throw this off, for instance the precision of the high-resolution counter is not likely to be down to a single microsecond. For example, if you want to wait 10 microseconds and the precision/frequency is one tick every 6 microseconds, your 10 microsecond wait will actually be no less than 12 microseconds. Again, this frequency is system dependent and will vary from system to system.
Also, Windows is not a real-time operating system. A process may be preempted at any time and it is up to Windows to decide when the process is rescheduled. The application may be preempted in the middle of this function and not restarted again until long after the expected wait time has elapsed. There really isn't much you can do about it but you'll probably never notice it if it happens.
60 fame per second is just the frequency of power in US (50 in Europe, Africa and Asia are somehow mixed) and is the frequency of video refreshing for hardware comfortable reasons (It can be an integer multiple on more sophisticated monitors). It was a mandatory constrains for CRT dispaly, and it is still a comfortable reference for LCD (that's how frequently the frame buffer is uploaded to the display)
The eye-cap is no more than 20-25 fps - not to be confused with retina persistency, that's about one-half - and that's why TV interlace two squares upon every refresh.
independently on the timing accuracy, whatever hardware device cannot be updated during its buffer-scan (otherwise the image changes while it is shown, resulting in half-drawn broken frames), hence, if you go faster than one half of the device refresh you are queued behind it and forced to wait for it.
60 fps in a game loop serves only to help CPU manufacturers to sell new faster CPUs. Slow down under 25 and everything will look more fluid.
SDL_Delay:
This function waits a specified number of milliseconds before returning. It waits at least the specified time, but possible longer due to OS scheduling. The delay granularity is at least 10 ms. Some platforms have shorter clock ticks but this is the most common.
The actual delays observed with this function depend on OS settings. I'd suggest to look into the
Mutimedia Timer API, particulary into the timeBeginPeriod function, to adapt the interrupt frequency to your requirements.
Obtaining and Setting Timer Resolution shows an example how to change the interrupt period to about 1ms. This way you don't have the 15ms hickup anymore. BTW: Eye-catch period is about 40ms.
Obtaining fixed period timing can also be addressed by Waitable Timer Objects. But the use of mutimedia timers is mandatory to obtain decent resolution, no matter what.
Using other tools to improve the timing capabilities is discussed here.

C++ waiting a frame before carrying out an action

How would you wait a frame in c++.
I don't want the program to sleep or anything.
It would go soemthing like
Do this in this frame (1)
Continue with rest of program
Do this in the next frame (2)
where action 1 happens only in the first frame and action 2 happens only in the next frame. It would continue like this. 1, 2, 1 again, 2
I have the time between frames, I use c++ and i'm using Visual Studio 2008 to compile.
Edit:
I'm using Opengl my OS is Windows 7.
Frame - http://en.wikipedia.org/wiki/Frame_rate
like each image of the scene printed to the screen over a given time period
I'm making some assumptions here.
Suppose you have a model for which you wish to show the state. You might wish to maximise the CPU time spent evolving the model rather than rendering.
So you fix the target frame rate, at e.g. 25 fps.
Again, assume you have optimised rendering so that it can be done in much less than 0.04 seconds.
So you might want something like (pseudo-code):
Time lastRendertime = now();
while(forever)
{
Time current = now();
if ((current - lastRenderTime > 0.04))
{
renderEverything();
lastRenderTime = current;
}
else
{
evolveModelABit();
}
}
Of course, you probably have an input handler to break the loop. Note that this approach assumes that you do not want the model evolution affected by elapsed real time. If you do, and may games do, then pass in the current time to the evolveModelABit();.
For time functions on Windows, you can use:
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t1; // ticks
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&t1);
Note that this approach is suitable for a scientific type simulation. The model evolution will not depend on the frame rate, rendering etc, and gives the same result very time.
For a game, typically there is a push for maximising the fps. This means that the main loop is of the form:
Time lastRendertime = now();
while(forever)
{
Time current = now();
evolveModelABit(current, lastRenderTime);
renderEverything();
lastRenderTime = current;
}
If V-Sync is enabled, SwapBuffers will block the current thread until the next frame has been shown. So if you create a worker thread, and release a lock, or resume its execution right before the call of SwapBuffers your programm recieves the CPU time it would otherwise yield to the rest of the system during the wait-for-swap block. If the worker thread is manipulating GPU resources, it is a good idea using high resolution/performance counters to determine how much time is left until the swap, minus some margin and use this timing in the worker thread, so that the worker thread puts itself to sleep at about the time the swap happens, so that the GPU will not have to context switch between worker and renderer thread.

Limit iterations per time unit

Is there a way to limit iterations per time unit? For example, I have a loop like this:
for (int i = 0; i < 100000; i++)
{
// do stuff
}
I want to limit the loop above so there will be maximum of 30 iterations per second.
I would also like the iterations to be evenly positioned in the timeline so not something like 30 iterations in first 0.4s and then wait 0.6s.
Is that possible? It does not have to be completely precise (though the more precise it will be the better).
#FredOverflow My program is running
very fast. It is sending data over
wifi to another program which is not
fast enough to handle them at the
current rate. – Richard Knop
Then you should probably have the program you're sending data to send an acknowledgment when it's finished receiving the last chunk of data you sent then send the next chunk. Anything else will just cause you frustrations down the line as circumstances change.
Suppose you have a good Now() function (GetTickCount() is bad example, it's OS specific and has bad precision):
for (int i = 0; i < 1000; i++){
DWORD have_to_sleep_until = GetTickCount() + EXPECTED_ITERATION_TIME_MS;
// do stuff
Sleep(max(0, have_to_sleep_until - GetTickCount()));
};
You can check elapsed time inside the loop, but it may be not an usual solution. Because computation time is totally up to the performance of the machine and algorithm, people optimize it during their development time(ex. many game programmer requires at least 25-30 frames per second for properly smooth animation).
easiest way (for windows) is to use QueryPerformanceCounter(). Some pseudo-code below.
QueryPerformanceFrequency(&freq)
timeWanted = 1.0/30.0 //time per iteration if 30 iterations / sec
for i
QueryPerf(count1)
do stuff
queryPerf(count2)
timeElapsed = (double)(c2 - c1) * (double)(1e3) / double(freq) //time in milliseconds
timeDiff = timeWanted - timeElapsed
if (timeDiff > 0)
QueryPerf(c3)
QueryPerf(c4)
while ((double)(c4 - c3) * (double)(1e3) / double(freq) < timeDiff)
queryPerf(c4)
end for
EDIT: You must make sure that the 'do stuff' area takes less time than your framerate or else it doesn't matter. Also instead of 1e3 for milliseconds, you can go all the way to nanoseconds if you do 1e9 (if you want that much accuracy)
WARNING... this will eat your CPU but give you good 'software' timing... Do it in a separate thread (and only if you have more than 1 processor) so that any guis wont lock. You can put a conditional in there to stop the loop if this is a multi-threaded app too.
#FredOverflow My program is running very fast. It is sending data over wifi to another program which is not fast enough to handle them at the current rate. – Richard Knop
What you might need a buffer or queue at the receiver side. The thread that receives the messages from the client (like through a socket) get the message and put it in the queue. The actual consumer of the messages reads/pops from the queue. Of course you need concurrency control for your queue.
Besides the flow control methods mentioned, if you also have the need to maintain an accurate specific data sending rate in your sender part. Usually it can be done like this.
E.x. if you want to send at 10Mbps, create a timer of interval 1ms so it will call a predefined function every 1ms. Then in the timer handler function, by keep tracking of 2 static variables 1)Time elapsed since beginning of sending data 2)How much data in bytes have been sent up to last call, you can easily calculate how much data is needed to be sent in the current call (or just sleep and wait for next call).
By this way, you can do "streaming" of data in a very stable way with very little jitterness, and this is usually adopted in streaming of videos. Of course it also depends on how accurate the timer is.