In an MFC-program I built myself I have some weird problems with the CPU usage.
I load a point cloud of around 360k points and everything works fine (I use VBO buffers which is the way to do it from what I understand?). I can move it around as I please and notice no adverse effects (CPU usage is very low, GPU does all the work). But then at certain angles and zoom values I see the CPU spike on one of my processors! I can then change the angle or zoom a little and it will go down to around 0 again. This is more likely to happen in a large window than a small one.
I measure the FPS of the program and it's constantly at 65, but when the CPU spike hits it typically goes down around 10 units to 55. I also measure the time SwapBuffers take and during normal operation it's around 0-1 ms. Once the CPU spike hits it goes up to around 20 ms, so it's clear something suddenly gets very hard to calculate in that function (for the GPU I guess?). This something is not in the DrawScene function (which is the function one would expect to eat CPU in a poor implementation), so I'm at a bit of a loss.
I know it's not due to the number of points visible because this can just as easily happen on just a sub-section of the data as on the whole cloud. I've tried to move it around and see if it's related to the depth buffer, clipping or similar but it seems entirely random what angles create the problem. It does seem somewhat repeatable though; moving the model to a position that was laggy once will be laggy when moved there again.
I'm very new at OpenGL so it's not impossible I've made some totally obvious error.
This is what the render loop looks like (it's run in an MFC app via a timer event with 1 ms period):
// Clear color and depth buffer bits
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// Draw OpenGL scene
OGLDrawScene();
unsigned int time1 = timeGetTime();
// Swap buffers
SwapBuffers(hdc);
// Calculate execution time for SwapBuffers
m_time = timeGetTime() - time1;
// Calculate FPS
++m_cnt;
if (timeGetTime() - m_lastTime > 1000)
{
m_fps = m_cnt;
m_cnt = 0;
m_lastTime = timeGetTime();
}
I've noticed that (at least a while back), ATI drivers tend to like to spin-wait a little too aggressively, while NVidia drivers tend to be interrupt driven. Technically spin-waiting is faster, assuming you have nothing better to do. Unfortunately, today you probably do have something better to do on another thread.
I think the OP's display drivers may indeed be spin-waiting.
Ok, so I think I have figured out how this can happen.
To begin with the WM_TIMER messages doesn't seem to be generated more often than every 15 ms at best, even when using timeBeginPeriod(1), at least on my computer. This resulted in the standard 65 fps I was seeing.
As soon as the scene took more than 15 ms to render, SwapBuffers would be the limiting factor instead. SwapBuffers seem to busy-wait, which resulted in 100% CPU usage on one core when this happened. This is not something that occurred at certain camera angles, but is a result of a fluidly changing fps depending on how many points were shown on the screen at the time. It just appeared to spike whenever rendering happened to hit a time over 15 ms and started to wait at SwapBuffers instead.
On a similar note, does anyone know of a function like "glReadyToSwap" or something like that? A function that indicates if the buffers are ready to be swapped? That way I could choose another method for rendering with a higher resolution (1 ms for example) and then each ms check if the buffers are ready to swap, if they aren't just wait another ms, so as to not busy-wait.
Related
I'm rendering a top-down, tile-based world, using opengl 3.3, using fully streamed VBO's.
After encountering some lag I did some benchmarking and what I found was horrid!
Let me explain the picture. The first marked square is me running my game using the simplest of shaders. There is no lightning, no nothing! I'm simply uploading 5000 vertices and draw them. My memory load is about 20-30%, cpu-load 30-40%
The second is with lightning. Every light is uploaded as an array to the fragment shader and every fragment processes the lights. load about 40-50%. 100% with 60 lights.
The third is with deferred shading. First I draw normal and diffuse to a FBO, then I render each light to the default FB, while reading from these. load is about 80%. Basically unaffected by amount of lights.
These are the scenes I render:
As you can see, there's nothing fancy. It's retro style. My plan has been to add tons of complexity and still run smooth on low-end computers. Mine is a i7 nvidia 660M, so it shouldn't have a problem.
For comparison I ran warcraft 3 and it took about 50-60% load, 20% memory.
One strange thing I've noticed is that if I disable V-sync and don't call glFinish before swapbuffers, load goes down significantly. However, the clock goes up and heat is produced (53C*).
Now, first I'm wondering if you think this is normal. If not, then what could be my bottleneck? Could it be my streaming VBO? I've tried double buffering and orphaning, but nothing. Doubling the number of sprites basically increases the memory load by 5-10%. the gpu-load remains basically the same.
I'm aware this question can't be easily answered, but I'll provide more details as you require them. Don't want to post my 20000 lines of code here.
Oh, and one more thing... It fluctuates. The draw calls are identical, but the load can go from 2-100%, whenever it feels like it.
UPDATE:
my main loop looks like this:
swapbuffers
renderAndDoGlCalls
updateGameAndPoll
sleep if there's any time left (1/60th second)
repeat.
Without v-sync, glflush or glfinsih, this results in percentage used:
swap: 0.16934400677376027
ren: 0.9929640397185616
upp:0.007698000307920012
poll:0.0615780024631201
sleep: 100.39487801579511
With glFinish prior to swapbuffers:
swap: 26.609977064399082 (this usually goes up to 80%)
ren: 1.231584049263362
upp:0.010266000410640016
poll:0.07697400307896013
sleep: 74.01582296063292
with Vsync it starts well, usually the same as with glFinish, then bam!:
swap: 197.84934791397393
ren: 1.221324048852962
upp:0.007698000307920012
poll:0.05644800225792009
sleep: 0.002562000102480004
And it stays that way.
Let me clarify... If I call swapbuffers right after all opengl calls, my CPU stalls for 70% of update-time, letting me do nothing. This way, I give the GPU the longest possible time to finish the backbuffer before I call the swap again.
You are actually inadvertently causing the opposite scenario.
The only time SwapBuffers causes the calling thread to stall is when the pre-rendered frame queue is full and it has to wait for VSYNC to flush a finished frame. The CPU could easily be a good 2-3 frames ahead of the GPU at any given moment, and it is not the current frame finishing that causes waiting (there's already a finished frame that needs to be swapped in this scenario).
Waiting happens because the driver cannot swap the backbuffer from back to front until the VBLANK signal rolls around (which only occurs once every 16.667ms). The driver will actually continue to accept commands while it is waiting for a swap up until it hits a certain limit (pre-rendered frames on NVIDIA hardware / flip queue size on AMD) worth of queued swaps. Once that limit is hit, GL commands will cause blocking until the back buffer(s) is/are swapped.
You are sleeping at the end of your frames, so no appreciable CPU/GPU parallelism ever develops; in fact you are more likely to skip a frame this way.
That is what you are seeing here. The absolute worst-case scenario is when you sleep for 1 ms too late to swap buffers in time for VBLANK. Your time between two frames then becomes 16.66667 + 15.66667 = 32.33332 ms. This causes a stutter that would not have happened if you did not add your own wait time. The driver could have easily copied the backbuffer from back to front and continued accepting commands in that 1 extra ms you added, but instead it blocks for an additional 15 at the beginning of the next frame.
To avoid this, you want to swap buffers as soon as possible after all commands for a frame have been issued. You have the best likelihood of meeting the VBLANK deadline this way. Reported CPU usage may go up since less time is spent sleeping, but performance should be measured using frame time rather than scheduled CPU time.
VSYNC and the pre-rendered frame limit discussed will keep your CPU and GPU from running out of control and generating huge amounts of heat as mentioned in the question.
Coming from a basic understanding of OpenGL programming, all required drawing operations are performed in a sequence, once per frame redraw. The performance of the hardware dictates essentially how fast this happens. As I understand, a game will attempt to draw as quickly as possible so redraw operations are essentially wrapped in a while loop. The graphics operations (graphics engine) will then be optimised to ensure the frame rate is acceptable for the application.
Graphics hardware supporting Vertical Synchronisation however locks frame rates to the display rate. A first question would be how should a graphics engine interact with the hardware synchronisation? Is this even possible or does the renderer work at maximum speed and the hardware selectively calls up the latest frame, discarding all unused previous frames..?
The motivation for this question is not that I am immediately intending to write a graphics engine, instead am debugging an issue with an existing system where the graphics of a moving scene appear to stutter onscreen. Symptomatically, the stutter is slight when VSync is turned off, when it is turned on either there is a significant and periodic stutter or alternatively the stutter is resolved entirely. I am somewhat clutching at straws as to what is happening or why, want to understand some more background information on graphics systems.
Summarily the question would be on how one is expected to interact with hardware redraw events and if that is even possible. However any additional information would be welcome.
A first question would be how should a graphics engine interact with the hardware synchronisation?
To avoid flicker modern rendering systems use double buffering i.e. there are two color plane buffers and after finishing drawing to one, the display readout pointer is set to the finished buffer plane. This buffer swap can happen synchronized or non-synchronized. With V-Sync enabled the buffer swap will be synchronized and the rendering thread blocks until the buffer swap happened.
Since with double buffering mandates buffer swaps this implicitly introduces a synchronization mechanism. This is how interactive rendering systems lock onto the display refresh.
Symptomatically, the stutter is slight when VSync is turned off, when it is turned on either there is a significant and periodic stutter or alternatively the stutter is resolved entirely.
This sounds like a badly written animation loop that assumes constant framerate locked onto the display refresh rate, based on the assumption that frames render faster than a display refresh interval and the buffer swap can be issued in time for the next retrace to happen.
The only robust way to deal with vertical synchronization is to actually measure the time between frame renderings and advance the rendering loop by that amount of time.
This is a guess, but:
The Problem Isn't Vertical Synchronization
I don't know what OS you're working with, but there are various ways to get information about the monitor and how fast the screen is refreshing (for the purposes of this answer, we'll assume your monitor is somewhat recent and redraws at a rate of 60 Hz, or 60 times every second, or once every 16.66666... milliseconds).
Renderers are usually paired up with an "Logic" side to the application: input, ui calculations, simulation running, etc. etc. It seems like the logic side of your application is running fast enough, but the Rendering side - i.e., the Draw Call as its commonly summed up into - is bounding the speed of your application.
Vertical Synchronization can exacerbate this in that if your Draw Call is made to happen every 16.66666 milliseconds - but it takes much longer than 16.666666 milliseconds - then you perceive a frame rate drop (i.e. frames will "stutter" because they're taking too long to produce a single frame). VSync - and the enabling or disabling thereof - is not something that bottlenecks your code: it just says "hey, since the Hardware is only going to take 1 frame from us every 16.666666 milliseconds, why make more draw calls than just one every 16.66666 milliseconds? As long as we do one draw call once for every passing of this time, our application will look as fluid as possible, and we don't have to waste time making more calls than that!"
The problem with that is that it assumes your code is going to run fast enough to make it in those 16.6666 milliseconds. If it does not, stuttering, lagging, visual artifacts, frozen frames, and other things manifest themselves on screen.
When you turn off VSync, you're telling your Render Call to be called as often as possible, as fast as possible. This may give it some extra wiggle room alongside the Logic call to get a frame rendered, so that when the Hardware Says "I'm gonna take a picture and put it on the screen now!" it's all prettied up, just in time, to get into posture and say cheese! (though by what you say, it barely makes it).
What To Do:
Start by profiling your code. Find out what functions are taking the most time. Judging by the stutter, something in your code is taking longer than is expected and is giving you undesirable performance. Make sure to profile first to find the critical sections of where you're burning away time, and figure out how to keep it correct and make it just as fast. You may want to figure out what's being called in the Render Call and profile the time it takes to complete one cycle of that specifically. Then time the Logic call(s) and see how long it takes to execute those as well. Then, chop away.
Good luck!
I am currently working on an OpenGL application to display a few 3D spheres to the user, which they can rotate, move around, etc. That being said, there's not much in the way of complexity here, so the application runs at quite a high framerate (~500 FPS).
Obviously, this is overkill - even 120 would be more then enough, but my issue is that running the application at full-stat eats away my CPU, causing excess heat, power consumption, etc. What I want to do is be able to let the user set an FPS cap so that the CPU isn't being overly used when it doesn't need to be.
I'm working with freeglut and C++, and have already set up the animations/event handling to use timers (using the glutTimerFunc). The glutTimerFunc, however, only allows an integer amount of milliseconds to be set - so if I want 120 FPS, the closest I can get is (int)1000/120 = 8 ms resolution, which equates to 125 FPS (I know it's a neglegible amount, but I still just want to put in an FPS limit and get exactly that FPS if I know the system can render faster).
Furthermore, using glutTimerFunc to limit the FPS never works consistently. Let's say I cap my application to 100 FPS, it usually never goes higher then 90-95 FPS. Again, I've tried to work out the time difference between rendering/calculations, but then it always overshoots the limit by 5-10 FPS (timer resolution possibly).
I suppose the best comparison here would be a game (e.g. Half Life 2) - you set your FPS cap, and it always hits that exact amount. I know I could measure the time deltas before and after I render each frame and then loop until I need to draw the next one, but this doesn't solve my 100% CPU usage issue, nor does it solve the timing resolution issue.
Is there any way I can implement an effective, cross-platform, variable frame rate limiter/cap in my application? Or, in another way, is there any cross-platform (and open source) library that implements high resolution timers and sleep functions?
Edit: I would prefer to find a solution that doesn't rely on the end user enabling VSync, as I am going to let them specify the FPS cap.
Edit #2: To all who recommend SDL (which I did end up porting my application to SDL), is there any difference between using the glutTimerFunc function to trigger a draw, or using SDL_Delay to wait between draws? The documentation for each does mention the same caveats, but I wasn't sure if one was more or less efficient then the other.
Edit #3: Basically, I'm trying to figure out if there is a (simple way) to implement an accurate FPS limiter in my application (again, like Half Life 2). If this is not possible, I will most likely switch to SDL (makes more sense to me to use a delay function rather then use glutTimerFunc to call back the rendering function every x milliseconds).
I'd advise you to use SDL. I personnally use it to manage my timers. Moreover, it can limit your fps to your screen refresh rate (V-Sync) with SDL 1.3. That enables you to limit CPU usage while having the best screen performance (even if you had more frames, they wouldn't be able to be displayed since your screen doesn't refresh fast enough).
The function is
SDL_GL_SetSwapInterval(1);
If you want some code for timers using SDL, you can see that here :
my timer class
Good luck :)
I think a good way to achieve this, no matter what graphics library you use, is to have a single clock measurement in the gameloop to take every single tick (ms) into account. That way the average fps will be exactly the limit just like in Half-Life 2. Hopefully the following code snippet will explain what I am talking about:
//FPS limit
unsigned int FPS = 120;
//double holding clocktime on last measurement
double clock = 0;
while (cont) {
//double holding difference between clocktimes
double deltaticks;
//double holding the clocktime in this new frame
double newclock;
//do stuff, update stuff, render stuff...
//measure clocktime of this frame
//this function can be replaced by any function returning the time in ms
//for example clock() from <time.h>
newclock = SDL_GetTicks();
//calculate clockticks missing until the next loop should be
//done to achieve an avg framerate of FPS
// 1000 / 120 makes 8.333... ticks per frame
deltaticks = 1000 / FPS - (newclock - clock);
/* if there is an integral number of ticks missing then wait the
remaining time
SDL_Delay takes an integer of ms to delay the program like most delay
functions do and can be replaced by any delay function */
if (floor(deltaticks) > 0)
SDL_Delay(deltaticks);
//the clock measurement is now shifted forward in time by the amount
//SDL_Delay waited and the fractional part that was not considered yet
//aka deltaticks
the fractional part is considered in the next frame
if (deltaticks < -30) {
/*dont try to compensate more than 30ms(a few frames) behind the
framerate
//when the limit is higher than the possible avg fps deltaticks
would keep sinking without this 30ms limitation
this ensures the fps even if the real possible fps is
macroscopically inconsitent.*/
clock = newclock - 30;
} else {
clock = newclock + deltaticks;
}
/* deltaticks can be negative when a frame took longer than it should
have or the measured time the frame took was zero
the next frame then won't be delayed by so long to compensate for the
previous frame taking longer. */
//do some more stuff, swap buffers for example:
SDL_RendererPresent(renderer); //this is SDLs swap buffers function
}
I hope this example with SDL helps. It is important to measure the time only once per frame so every frame is taken into account.
I recommend to modularize this timing in a function which also makes your code clearer. This code snipped has no comments in the case they just annoyed you in the last one:
unsigned int FPS = 120;
void renderPresent(SDL_Renderer * renderer) {
static double clock = 0;
double deltaticks;
double newclock = SDL_GetTicks();
deltaticks = 1000.0 / FPS - (newclock - clock);
if (floor(deltaticks) > 0)
SDL_Delay(deltaticks);
if (deltaticks < -30) {
clock = newclock - 30;
} else {
clock = newclock + deltaticks;
}
SDL_RenderPresent(renderer);
}
Now you can call this function in your mainloop instead of your swapBuffer function (SDL_RenderPresent(renderer) in SDL). In SDL you'd have to make sure the SDL_RENDERER_PRESENTVSYNC flag is turned off. This function relies on the global variable FPS but you can think of other ways of storing it. I just put the whole thing in my library's namespace.
This method of capping the framerate delivers exactly the desired average framerate if there are no large differences in the looptime over multiple frames because of the 30ms limit to deltaticks. The deltaticks limit is required. When the FPS limit is higher than the actual framerate deltaticks will drop indefinitely. Also when the framerate then rises above the FPS limit again the code would try to compensate the lost time by rendering every frame immediately resulting in a huge framerate until deltaticks rises back to zero. You can modify the 30ms to fit your needs, it is just an estimate by me. I did a couple of benchmarks with Fraps. It works with every imaginable framerate really and delivers beautiful results from what I have tested.
I must admit I coded this just yesterday so it is not unlikely to have some kind of bug. I know this question was asked 5 years ago but the given answers did not statify me. Also feel free to edit this post as it is my very first one and probably flawed.
EDIT:
It has been brought to my attention that SDL_Delay is very very inaccurate on some systems. I heard a case where it delayed by far too much on android. This means my code might not be portable to all your desired systems.
The easiest way to solve it is to enable Vsync. That's what I do in most games to prevent my laptop from getting too hot.
As long as you make sure the speed of your rendering path is not connected to the other logic, this should be fine.
There is a function glutGet( GLUT_ELAPSED_TIME ) which returns the time since started in miliseconds, but that's likely still not fast enough.
A simple way is to make your own timer method, which uses the HighPerformanceQueryTimer on windows, and the getTimeOfDay for POSIX systems.
Or you can always use timer functions from SDL or SFML, which do basically the same as above.
You should not try to limit the rendering rate manually, but synchronize with the display vertical refresh. This is done by enabling V sync in the graphics driver settings. Apart from preventing (your) programs from rendering at to high rate, it also increases picture quality by avoiding tearing.
The swap interval extensions allow your application to fine tune the V sync behaviour. But in most cases just enabling V sync in the driver and letting the buffer swap block until sync suffices.
I would suggest using sub-ms precision system timers (QueryPerformanceCounter, gettimeofday) to get timing data. These can help you profile performance in optimized release builds also.
Some background information:
SDL_Delay is pretty much the same as Sleep/sleep/usleep/nanosleep but it is limited to milliseconds as parameter
Sleeping works by relying on the systems thread scheduler to continue your code.
Depending on your OS and hardware the scheduler may have a lower tick frequency than 1000hz, which results in longer timespans than you have specified when calling sleep, so you have no guarantee to get the desired sleep time.
You can try to change the scheduler's frequency. On Windows you can do it by calling timeBeginPeriod(). For linux systems checkout this answer.
Even if your OS supports a scheduler frequency of 1000hz your hardware may not, but most modern hardware does.
Even if your scheduler's frequency is at 1000hz sleep may take longer if the system is busy with higher priority processes, but this should not happen if your system isn't under super high load.
To sum up, you may sleep for microseconds on some tickless linux kernels, but if you are interested in a cross platform solution you should try to get the scheduler frequency up to 1000hz to ensure the sleeps are accurate in most of the cases.
To solve the rounding issue for 120FPS:
1000/120 = 8,333ms
(int)1000/120 = 8ms
Either you do a busy wait for 333 microseconds and sleep for 8ms afterwards. Which costs some CPU time but is super accurate.
Or you follow Neop approach by sleeping sometimes 8ms and sometimes 9ms seconds to average out at 8,333ms. Which is way more efficent but less accurate.
I have 200 frames to be displayed per second. The frames are very very simple, black and white, just a couple of lines. A timer is driving the animation. The goal is to play back the frames at approx. 200 fps.
Under Linux I set the timer to 5 ms and I let it display every frame (that is 200fps). Works just fine but it fails under Win 7.
Under Win 7 (the same machine) I had to set the timer to 20 ms and let it display every 4 frame (50 fps × 4 = 200). I found these magic numbers by trial and error.
What should I do to guarantee (within reasonable limits) that the animation will be played back at a proper speed on the user's machine?
For example, what if the user's machine can only do 30 fps or 60 fps?
The short answer is, you can't (in general).
For best aesthetics, most windowing systems have "vsync" on by default, meaning that screen redraws happen at the refresh rate of the monitor. In the old CRT days, you might be able to get 75-90 Hz with a high-end monitor, but with today's LCDs you're likely stuck at 60 fps.
That said, there are OpenGL extensions that can disable VSync (don't remember the extension name off hand) programmatically, and you can frequently disable it at the driver level. However, no matter what you do (barring custom hardware), you're not going to be able to display complete frames at 200 fps.
Now, it's not clear if you've got pre-rendered images that you need to display at 200 fps, or if you're rendering from scratch and hoping to achieve 200 fps. If it's the former, a good option might be to use a timer to determine which frame you should display (at each 60 Hz. update), and use that value to linearly interpolate between two of the pre-rendered frames. If it's the latter, I'd just use the timer to control motion (or whatever is dynamic in your scene) and render the appropriate scene given the time. Faster hardware or disabled VSYNC will give you more frames (hence smoother animation, modulo the tearing) in the same amount of time, etc. But the scene will unfold at the right pace either way.
Hope this is helpful. We might be able to give you better advice if you give a little more info on your application and where the 200 fps requirement originates.
I've already read, that you've data sampled at 200Hz, which you want to play back at natural speed. I.e. one second of sampled data shall be rendered over one second time.
First: Forget about using timers to coordinate your rendering, this is unlikely to work properly. Instead you should measure the time a full rendering cycle (including v-sync) takes and advance the animation time-counter by this. Now 200Hz is already some very good time resolution, so if the data is smooth enough, then there should be no need to interpolate at all. So something like this (Pseudocode):
objects[] # the objects, animated by the animation
animation[] # steps of the animation, sampled at 200Hz
ANIMATION_RATE = 1./200. # Of course this shouldn't be hardcoded,
# but loaded with the animation data
animationStep = 0
timeLastFrame = None
drawGL():
timeNow = now() # time in seconds with (at least) ms-accuracy
if timeLastFrame:
stepTime = timeNow - timeLastFrame
else:
stepTime = 0
animationStep = round(animationStep + stepTime * ANIMATION_RATE)
drawObjects(objects, animation[animationStep])
timeLastFrame = timeNow
It may be, that your rendering is much faster than the time between screen refreshs. In that case you may want to render some of the intermediate steps, too, to get some kind of motion blur effect (you can also use the animation data to obtain motion vectors, which can be used in a shader to create a vector blur effect), the render loop would then look like this:
drawGL():
timeNow = now() # time in seconds with (at least) ms-accuracy
if timeLastFrame:
stepTime = timeNow - timeLastFrame
else:
stepTime = 0
timeRenderStart = now()
animationStep = round(animationStep + stepTime * ANIMATION_RATE)
drawObjects(objects, animation[animationStep])
glFinish() # don't call SwapBuffers
timeRender = now() - timeRenderStart
setup_GL_for_motion_blur()
intermediates = floor(stepTime / timeRender) - 1 # subtract one to get some margin
backstep = ANIMATION_RATE * (stepTime / intermediates)
if intermediates > 0:
for i in 0 to intermediates:
drawObjects(objects, animation[animationStep - i * backstep])
timeLastFrame = timeNow
One way is to sleep for 1ms at each iteration of your loop and check how much time has passed.
If more than the target amount of time has passed (for 200fps that is 1000/200 = 5ms), then draw a frame. Else, continue to the next iteration of the loop.
E.g. some pseudo-code:
target_time = 1000/200; // 200fps => 5ms target time.
timer = new timer(); // Define a timer by whatever method is permitted in your
// implementation.
while(){
if(timer.elapsed_time < target_time){
sleep(1);
continue;
}
timer.reset(); // Reset your timer to begin counting again.
do_your_draw_operations_here(); // Do some drawing.
}
This method has the advantage that if the user's machine is not capable of 200fps, you will still draw as fast as possible, and sleep will never be called.
There are probably two totally independant factors to consider here:
How fast is the users machine? It could be that you are not achieving your target frame rate due to the fact that the machine is still processing the last frame by the time it is ready to start drawing the next frame.
What is the resolution of the timers you are using? My impression (although I have no evidence to back this up) is that timers under Windows operating systems provide far poorer resolution than those under Linux. So you might be requesting a sleep of (for example) 5 mS, and getting a sleep of 15 mS instead.
Further testing should help you figure out which of these two scenarios is more pertinent to your situation.
If the problem is a lack of processing power, you can choose to display intermediate frames (as you are doing now), or degrade the visuals (lower quality, lower resolution, or anything else that might help speed thigns up).
If the problem is timer resolution, you can look at alternative timer APIs (Windows API provides two different timer functionc alls, each with different resolutions, perhaps you are using the wrong one), or try and compensate by asking for smaller time slices (as in Kdoto's suggestion). However, doing this may actually degrade performance, since you're now doing a lot more processing than you were before - you may notice your CPU usage spike under this method.
Edit:
As Drew Hall mentions in his answer, there's another whole site to this: The refresh rate you get in code may be very different to the actual refresh rate appearing on screen. However, that's output device dependent, and it sounds from your question like the issue is in code, rather than in output hardware.
Is there any way to calculate how much updates should be made to reach desired frame rate, NOT system specific? I found that for windows, but I would like to know if something like this exists in openGL itself. It should be some sort of timer.
Or how else can I prevent FPS to drop or raise dramatically? For this time I'm testing it on drawing big number of vertices in line, and using fraps I can see frame rate to go from 400 to 200 fps with evident slowing down of drawing it.
You have two different ways to solve this problem:
Suppose that you have a variable called maximum_fps, which contains for the maximum number of frames you want to display.
Then You measure the amount of time spent on the last frame (a timer will do)
Now suppose that you said that you wanted a maximum of 60FPS on your application. Then you want that the time measured be no lower than 1/60. If the time measured s lower, then you call sleep() to reach the amount of time left for a frame.
Or you can have a variable called tick, that contains the current "game time" of the application. With the same timer, you will incremented it at each main loop of your application. Then, on your drawing routines you calculate the positions based on the tick var, since it contains the current time of the application.
The big advantage of option 2 is that your application will be much easier to debug, since you can play around with the tick variable, go forward and back in time whenever you want. This is a big plus.
Rule #1. Do not make update() or loop() kind of functions rely on how often it gets called.
You can't really get your desired FPS. You could try to boost it by skipping some expensive operations or slow it down by calling sleep() kind of functions. However, even with those techniques, FPS will be almost always different from the exact FPS you want.
The common way to deal with this problem is using elapsed time from previous update. For example,
// Bad
void enemy::update()
{
position.x += 10; // this enemy moving speed is totally up to FPS and you can't control it.
}
// Good
void enemy::update(elapsedTime)
{
position.x += speedX * elapsedTime; // Now, you can control its speedX and it doesn't matter how often it gets called.
}
Is there any way to calculate how much updates should be made to reach desired frame rate, NOT system specific?
No.
There is no way to precisely calculate how many updates should be called to reach desired framerate.
However, you can measure how much time has passed since last frame, calculate current framerate according to it, compare it with desired framerate, then introduce a bit of Sleeping to reduce current framerate to the desired value. Not a precise solution, but it will work.
I found that for windows, but I would like to know if something like this exists in openGL itself. It should be some sort of timer.
OpenGL is concerned only about rendering stuff, and has nothing to do with timers. Also, using windows timers isn't a good idea. Use QueryPerformanceCounter, GetTickCount or SDL_GetTicks to measure how much time has passed, and sleep to reach desired framerate.
Or how else can I prevent FPS to drop or raise dramatically?
You prevent FPS from raising by sleeping.
As for preventing FPS from dropping...
It is insanely broad topic. Let's see. It goes something like this: use Vertex buffer objects or display lists, profile application, do not use insanely big textures, do not use too much alpha-blending, avoid "RAW" OpenGL (glVertex3f), do not render invisible objects (even if no polygons are being drawn, processing them takes time), consider learning about BSPs or OCTrees for rendering complex scenes, in parametric surfaces and curves, do not needlessly use too many primitives (if you'll render a circle using one million polygons, nobody will notice the difference), disable vsync. In short - reduce to absolute possible minimum number of rendering calls, number of rendered polygons, number of rendered pixels, number of texels read, read every available performance documentation from NVidia, and you should get a performance boost.
You're asking the wrong question. Your monitor will only ever display at 60 fps (50 fps in Europe, or possibly 75 fps if you're a pro-gamer).
Instead you should be seeking to lock your fps at 60 or 30. There are OpenGL extensions that allow you to do that. However the extensions are not cross platform (luckily they are not video card specific or it'd get really scary).
windows: wglSwapIntervalEXT
x11 (linux): glXSwapIntervalSGI
max os x: ?
These extensions are closely tied to your monitor's v-sync. Once enabled calls to swap the OpenGL back-buffer will block until the monitor is ready for it. This is like putting a sleep in your code to enforce 60 fps (or 30, or 15, or some other number if you're not using a monitor which displays at 60 Hz). The difference it the "sleep" is always perfectly timed instead of an educated guess based on how long the last frame took.
You absolutely do wan't to throttle your frame-rate it all depends on what you got
going on in that rendering loop and what your application does. Especially with it's
Physics/Network related. Or if your doing any type of graphics processing with an out side toolkit (Cairo, QPainter, Skia, AGG, ...) unless you want out of sync results or 100% cpu usage.
This code may do the job, roughly.
static int redisplay_interval;
void timer(int) {
glutPostRedisplay();
glutTimerFunc(redisplay_interval, timer, 0);
}
void setFPS(int fps)
{
redisplay_interval = 1000 / fps;
glutTimerFunc(redisplay_interval, timer, 0);
}
Here is a similar question, with my answer and worked example
I also like deft_code's answer, and will be looking into adding what he suggests to my solution.
The crucial part of my answer is:
If you're thinking about slowing down AND speeding up frames, you have to think carefully about whether you mean rendering or animation frames in each case. In this example, render throttling for simple animations is combined with animation acceleration, for any cases when frames might be dropped in a potentially slow animation.
The example is for animation code that renders at the same speed regardless of whether benchmarking mode, or fixed FPS mode, is active. An animation triggered before the change even keeps a constant speed after the change.