Effective frame limiting - c++

I have a simulation that I am trying to convert to "real time". I say "real time" because its okay for performance to dip if needed (slowing down time for the observers/clients too). However, if there is a small number of objects, I want to limit the performance so that it runs at a steady frame rate (~100 FPS in this case).
I tried sleep() and Sleep() for linux and windows respectively but it doesn't seem to be accurate enough as the FPS really dips to a fraction of what I was aiming for. I suppose this scenario is common for games, especially online games but I was not able to find any helpful material on the subject. What is the preferable way of frame limiting? Is there a sleep method that can guarantee that it won't give up more time than what was specified?
Note: I'm running this on 2 different clusters (linux and windows) and all nodes only have built-in video. So I have to implement limiting on both platforms and it shouldn't be video card based (if there is even such a thing). I also need to implement the limiting on just one thread/node because there is already synchronization between nodes and the others would automatically be limited if one thread is properly limited.
Edit: some pseudo code that shows how I implemented the current limiter:
while (ProcessControlMessages())
{
uint64 tStart;
SimulateFrame();
uint64 newT =_context.GetTimeMs64();
if (newT - tStart < DESIRED_FRAME_RATE_DURATION)
this_thread::sleep_for(chrono::milliseconds(DESIRED_FRAME_RATE_DURATION - (newT - tStart)));
}
I was also thinking if I could do the limiting every N frames, where N is a fraction of the desired frame rate. I'll give it a try and report back.

For games a frame limiter is usually inadequate. Instead, the methods that update the game state (in your case SimulateFrame()) are kept frame rate independent. E.g. if you want to move an object, then the actual offset is the object's speed multiplied with the last frame's duration. Similarly, you can do this for all kind of calculations.
This approach has the advantage that the user gets maximum frame rate while maintaining the real-timeness. However, you should watch out that the frame durations don't get too small ( < 1 ms). This could result in inaccurate calculations. In this case a small sleep with a fixed duration could help.
This is how games usually handle this problem. You have to check if your simulation is appropriate for this technique, too.

Instead of having each frame try to sleep for long enough to be a full frame, have them sleep to try to average out. Keep a global/thread owned time count. for each frame have a "desired earliest end time," calculated from the previous desired earliest end time, rather than from the current time
tGoalEndTime = _context.GetTimeMS64() + DESIRED_FRAME_RATE_DURATION;
while (ProcessControlMessages())
{
SimulateFrame();
uint64 end =_context.GetTimeMs64();
if (end < tGoalEndTime) {
this_thread::sleep_for(chrono::milliseconds(tGoalEndTime - end)));
tGoalEndTime += DESIRED_FRAME_RATE_DURATION;
} else {
tGoalEndTime = end; // we ran over, pretend we didn't and keep going
}
Note: this uses your example's sleep_for because I wanted to show the minimum number of changes to enact it. sleep_until works better here.
The trick is that any frame that sleeps too long immediately causes the next few frames to rush to catch up.
Note: You cannot get any timing within 2ms (20% jitter on 100fps) on modern consumer OSs. The quantum for threads on most consumer OSs is around 100ms, so the instant you sleep, you may sleep for multiple quantums before it is your turn. sleep_until may use a OS specific technique to have less jitter, but you can't rely on it.

Related

How to fix execution speed inconsistencies in C++

This is most noticeable on graphic files. Let's take as an example the OpenGL base program (a spinning triangle).
Whenever I run one normally, with no other apps open in the background, it will spin slowly, but when I run a game in the background, it starts spinning like mad. It seems as if the computer doesn't allocate enough memory for the programs to run at maximum speed, and paradoxically, doing resource-consuming stuff will accelerate it because it gets more memory.
The only way I found to fix this partially is to put a higher value in the Sleep function, however this doesn't fix it completely nor is a consistent solution, as other problems may arise from it. Is there any good way to fix this and make the program run consistently?
This mostly happens because you are not capping your FPS so there's nothing preventing your render loop from being called as much as possible and your logic (that is controlling rotation is executing in same loop).
What happens is that most GPUs have power management so they keep their frequencies low when there's no demand, opening an expensive game makes your GPU bump up its power thus rendering a lot faster, thus calling your rendering loop more times.
To prevent this (and to separate logic from rendering time in general) you must control the frame rate and use the time as an input for your rotation, something like:
auto elapsed = ..
while (!exit) {
render();
auto delta = now - elapsed;
if (delta < time_per_frame)
delay(TIME_PER_FRAME - delta);
updateLogic(delta);
}
For starters, you need to understand what's going on with your program. It has nothing to do with memory, and I don't see a reason to think about memory.
Opening other programs could make your CPU go faster because of the load (doubtfull, but clearly more likely than memory allocation).
The other programs could be messing with some setting.
If you're using sleep(), signals can interrupt the call (no one ever looks at the return code of the function; there's a reason for it to be uint sleep(uint) and not void sleep(uint)).
If you can, don't use sleep. And if you're going to, check. sleep doesn't grant you that the whole time has passed (IMHO bad design, but I'm not a POSIX fan).
The usual behaviour I think would be to have your function called periodically as a callback. If you're going to do some sort of delay or sleep, you should check that the time has passed.
Taken that you want your logic tied to the render and you use some function like sleep that can be interrupted (based on the other answer):
while (!exit) {
auto startofFrame= now();
render();
auto toDelay= startOfFrame + TIME_PER_FRAME - now();
while (toDelay > 0) {
delay(toDelay);
toDelay= startOfFrame + TIME_PER_FRAME - now();
}
updateLogic();
}

C++ Run only for a certain time

I'm writing a little game in c++ atm.
My Game While loop is always active, in this loop,
I have a condition if the player is shooting.
Now I face the following problem,
After every shot fired, there is a delay, this delay changes over time and while the delay the player should move.
shoot
move
wait 700 ms
shoot again
atm I'm using Sleep(700) the problem is I can't move while the 700 ms, I need something like a timer, so the move command is only executed for 700 ms instead of waiting 700 ms
This depends on how your hypothetical 'sleep' is implemented. There's a few things you should know, as it can be solved in a few ways.
You don't want to put your thread to sleep because then everything halts, which is not what you want.
Plus you may get more time than sleep allows. For example, if you sleep for 700ms you may get more than that, which means if you depend on accurate times you will get burned possibly by this.
1) The first way would be to record the raw time inside of the player. This is not the best approach but it'd work for a simple toy program and record the result of std::chrono::high_resolution_clock::now() (check #include <chrono> or see here) inside the class at the time you fire. To check if you can fire again, just compare the value you stored to ...::now() and see if 700ms has elapsed. You will have to read the documentation to work with it in milliseconds.
2) A better way would be to give your game a pulse via something called 'game ticks', which is the pulse to which your world moves forward. Then you can store the gametick that you fired on and do something similar to the above paragraph (except now you are just checking if currentGametick > lastFiredGametick + gametickUntilFiring).
For the gametick idea, you would make sure you do gametick++ every X milliseconds, and then run your world. A common value is somewhere between 10ms and 50ms.
Your game loop would then look like
while (!exit) {
readInput();
if (ticker.shouldTick()) {
ticker.tick();
world.tick(ticker.gametick);
}
render();
}
The above has the following advantages:
You only update the world every gametick
You keep rendering between gameticks, so you can have smooth animations since you will be rendering at a very high framerate
If you want to halt, just spin in a while loop until the amount of time has elapsed
Now this has avoided a significant amount of discussion, of which you should definitely read this if you are thinking of going the gametick route.
With whatever route you take, you probably need to read this.

IDirectXVideoDecoder performance

I am trying to understand some of the nuances of IDirectXVideoDecoder. CAVEAT: The conclusions stated below are not based on the DirectX docs or any other official source, but are my own observations and understandings. That said...
In normal use, IDirectXVideoDecoder is easily fast enough to process frames at any sensible frame rate. However, if you aren't rendering the frames based on timecodes and instead are going "as fast as possible," then you eventually run into a bottleneck in the decoder and IDirectXVideoDecoder::BeginFrame starts returning E_PENDING.
Apparently at any given time, a system can only have X frames active in the decoder. Attempting to submit X + 1 gives you this error until one of the previous frames completes. On my (somewhat older) box, X == 4. On my newer box, X == 8.
Which brings us to my first question:
Q1: How do I find out how many simultaneous decoding operations a system supports? What property/attribute describes this?
Then there's the question of what to do when you hit this error. I can think of 3 different approaches, but they all have drawbacks:
1) Just do a loop waiting for a decoder to free up:
do {
hr = m_pVideoDecoder->BeginFrame(pSurface9Video[y], NULL);
} while(hr == E_PENDING);
On the plus side, this approach gives the fastest throughput. On the minus side, this causes a massive amount of CPU time to get burned waiting for a decoder to free up (>93% of my execution time gets spent here).
2) Do a loop, and add a Sleep:
do {
hr = m_pVideoDecoder->BeginFrame(pSurface9Video[y], NULL);
if (hr == E_PENDING)
Sleep(1);
} while(hr == E_PENDING);
On the plus side, this significantly drops the CPU utilization. But on the minus side, it ends up slowing down the total throughput.
In trying to figure out why it's slowing things down, I made a few observations:
Normal time to process a frame on my system is ~4 milliseconds.
Sleep(1) can Sleep for as much as 8 milliseconds, even when there are CPUs available to run on.
Frames sent to the decoders aren't being added to a queue and decoded one at a time. It actually performs X decodings at the same time.
The result of all this is that if you try to Sleep, one of the decoders frequently ends up sitting idle.
3) Before submitting the next frame for decoding, wait for one of the previous frames to complete:
// LockRect doesn't return until the surface is ready.
D3DLOCKED_RECT lr;
// I don't think this matters. It may always return the whole frame.
RECT r = {0, 0, 2, 2};
hr = pSurface9Video[old]->LockRect(&lr, &r, D3DLOCK_READONLY);
if (SUCCEEDED(hr))
pSurface9Video[old]->UnlockRect();
This also drops the CPU usage, but it also has a throughput penalty. Maybe due to the 'surface' being in use longer than the 'decoder,' but more likely because the amount of time it takes to (pointlessly) transfer the frame back to memory.
Which brings us to the second question:
Q2: Is there some way here to maximize throughput without pointlessly pounding on the CPU?
Final thoughts:
It appears that LockRect must be doing a WaitForSingleObject. If I had access to that handle, waiting on it (without also copying the frame back) seems like it would be the best solution. But I can't figure out where to get it. I've tried GetDC, GetPrivateData, even looking at the debug data members for IDirect3DSurface9. I'm not finding it.
IDirectXVideoDecoder::EndFrame outputs a handle in a parameter named pHandleComplete. This sounds like exactly what I need. Unfortunately it is marked as "reserved" and doesn't seem to work. Unless there is a trick?
I'm pretty new to DirectX, so maybe I've got this all wrong?
Update 1:
Re Q1: Turns out both my machines only support 4 decoders (oops). This will make it harder to determine which property I'm looking for. While very few properties (none actually) return 8 on one machine and 4 on the other, there are several that return 4.
Re Q2: Since the (4) decoders are (presumably) shared between apps, the idea of finding out if the decoding is complete by (somehow) querying to see if the decoder is idle is a non-starter.
The call to create surfaces doesn't create handles (handle count stays the same across the call). So the idea of waiting on the "surface's handle" doesn't seem like it's going to pan out either.
The only idea I have left is to see if the surface is available by making some other call (besides LockRect) using it. So far I've tried calling StretchRect and ColorFill on a surface that the decoder is "still using," but they complete without error instead of blocking like LockRect.
There may not be a better answer here. So far it appears that for best performance, I should use #1. If CPU utilization is an issue, #2 is better than #1. If I'm going to be reading the surfaces back to memory anyway, then #3 makes sense, otherwise, stick with 1 or 2.

Achieving game engine determinism with threading

I would like to achieve determinism in my game engine, in order to be able to save and replay input sequences and to make networking easier.
My engine currently uses a variable timestep: every frame I calculate the time it took to update/draw the last one and pass it to my entities' update method. This makes 1000FPS games seem as fast ad 30FPS games, but introduces undeterministic behavior.
A solution could be fixing the game to 60FPS, but it would make input more delayed and wouldn't get the benefits of higher framerates.
So I've tried using a thread (which constantly calls update(1) then sleeps for 16ms) and draw as fast as possible in the game loop. It kind of works, but it crashes often and my games become unplayable.
Is there a way to implement threading in my game loop to achieve determinism without having to rewrite all games that depend on the engine?
You should separate game frames from graphical frames. The graphical frames should only display the graphics, nothing else. For the replay it won't matter how many graphical frames your computer was able to execute, be it 30 per second or 1000 per second, the replaying computer will likely replay it with a different graphical frame rate.
But you should indeed fix the gameframes. E.g. to 100 gameframes per second. In the gameframe the game logic is executed: stuff that is relevant for your game (and the replay).
Your gameloop should execute graphical frames whenever there is no game frame necessary, so if you fix your game to 100 gameframes per second that's 0.01 seconds per gameframe. If your computer only needed 0.001 to execute that logic in the gameframe, the other 0.009 seconds are left for repeating graphical frames.
This is a small but incomplete and not 100% accurate example:
uint16_t const GAME_FRAMERATE = 100;
uint16_t const SKIP_TICKS = 1000 / GAME_FRAMERATE;
uint16_t next_game_tick;
Timer sinceLoopStarted = Timer(); // Millisecond timer starting at 0
unsigned long next_game_tick = sinceLoopStarted.getMilliseconds();
while (gameIsRunning)
{
//! Game Frames
while (sinceLoopStarted.getMilliseconds() > next_game_tick)
{
executeGamelogic();
next_game_tick += SKIP_TICKS;
}
//! Graphical Frames
render();
}
The following link contains very good and complete information about creating an accurate gameloop:
http://www.koonsolo.com/news/dewitters-gameloop/
To be deterministic across a network, you need a single point of truth, commonly called "the server". There is a saying in the game community that goes "the client is in the hands of the enemy". That's true. You cannot trust anything that is calculated on the client for a fair game.
If for example your game gets easier if for some reasons your thread only updates 59 times a second instead of 60, people will find out. Maybe at the start they won't even be malicious. They just had their machines under full load at the time and your process didn't get to 60 times a second.
Once you have a server (maybe even in-process as a thread in single player) that does not care for graphics or update cycles and runs at it's own speed, it's deterministic enough to at least get the same results for all players. It might still not be 100% deterministic based on the fact that the computer is not real time. Even if you tell it to update every $frequence, it might not, due to other processes on the computer taking too much load.
The server and clients need to communicate, so the server needs to send a copy of it's state (for performance maybe a delta from the last copy) to each client. The client can draw this copy at the best speed available.
If your game is crashing with the thread, maybe it's an option to actually put "the server" out of process and communicate via network, this way you will find out pretty fast, which variables would have needed locks because if you just move them to another project, your client will no longer compile.
Separate game logic and graphics into different threads . The game logic thread should run at a constant speed (say, it updates 60 times per second, or even higher if your logic isn't too complicated, to achieve smoother game play ). Then, your graphics thread should always draw the latest info provided by the logic thread as fast as possible to achieve high framerates.
In order to prevent partial data from being drawn, you should probably use some sort of double buffering, where the logic thread writes to one buffer, and the graphics thread reads from the other. Then switch the buffers every time the logic thread has done one update.
This should make sure you're always using the computer's graphics hardware to its fullest. Of course, this does mean you're putting constraints on the minimum cpu speed.
I don't know if this will help but, if I remember correctly, Doom stored your input sequences and used them to generate the AI behaviour and some other things. A demo lump in Doom would be a series of numbers representing not the state of the game, but your input. From that input the game would be able to reconstruct what happened and, thus, achieve some kind of determinism ... Though I remember it going out of sync sometimes.

Modify Time for simulation in c++

i am writing a program which simulates an activity, i am wondering how to speed up time for the simulation, let say 1 hour in the real world is equal to 1 month in the program.
thank you
the program is actually similar to a restaurant simulation where you dont really know when customer come. let say we pick a random number (2-10) customer every one hour
It depends on how it gets time now.
For example, if it calls Linux system time(), just replace that with your own function (like mytime) which returns speedier times. Perhaps mytime calls time and multiplies the returned time by whatever factor makes sense. 1 hr = 1 month is 720 times. Handling the origin as when the program begins should be accounted for:
time_t t0;
main ()
{
t0 = time(NULL); // at program initialization
....
for (;;)
{
time_t sim_time = mytime (NULL);
// yada yada yada
...
}
}
time_t mytime (void *)
{
return 720 * (time (NULL) - t0); // account for time since program started
// and magnify by 720, so one hour is one month
}
You just do it. You decide how many events take place in an hour of simulation time (eg., if an event takes place once a second, then after 3600 simulated events you've simulated an hour of time). There's no need for your simulation to run in real time; you can run it as fast as you can calculate the relevant numbers.
It sounds like you are implementing a Discrete Event Simulation. You don't even need to have a free-running timer (no matter what scaling you may use) in such a situation. It's all driven by the events. You have a priority queue containing events, ordered by the event time. You have a processing loop which takes the event at the head of the queue, and advances the simulation time to the event time. You process the event, which may involve scheduling more events. (For example, the customerArrived event may cause a customerOrdersDinner event to be generated 2 minutes later.) You can easily simulate customers arriving using random().
The other answers I've read thus far are still assuming you need a continuous timer, which is usually not the most efficient way of simulating an event-driven system. You don't need to scale real time to simulation time, or have ticks. Let the events drive time!
If the simulation is data dependent (like a stock market program), just speed up the rate at which the data is pumped. If it is some think that depends on time() calls you will have to do some thing like wallyk's answer (assuming you have the source code).
If time in your simulation is discrete, one option is to structure your program so that something happens "every tick".
Once you do that, time in your program is arbitrarily fast.
Is there really a reason for having a month of simulation time correspond exactly to an hour of time in the real world ? If yes, you can always process the number of ticks that correspond to a month, and then pause the appropriate amount of time to let an hour of "real time" finish.
Of course, a key variable here is the granularity of your simulation, i.e. how many ticks correspond to a second of simulated time.