Render behaves very strangely, over time fps starts to fall very hard, up to 70%. I've tried reducing the number of objects to render, simplifying shaders (1-3 operations., but it didn't solve the problem. When debugging CPU in visual studio I see how IDXGISwapChain::Present takes more time, although the scene is static and nothing in it changes.
To give you an example, it goes like this:
Running the application at 60fps
Waiting 30 seconds 50fps
Another 30 seconds 30fps
Minimize the application and wait 30 seconds (I use the delay for an inactive application) like slee(100)
Return to the application 60fps
Wait 30 seconds 50fps
30 more seconds 30fps
I have simplified the shaders to just a couple of operations in them(1-3).
This is happening on different PCs. It also doesn't depend on the complexity of the scene, just the initial fps will be higher. I tried everything I found stackoverflow, but nothing solved the problem.
I only get one error in debug mode, but I don't think it's the cause of it all:
D3D11 WARNING: ID3D11DeviceContext::DrawIndexed: The Pixel Shader expects a Render Target View bound to slot 0, but none is bound. This is OK, as writes of an unbound Render Target View are discarded. It is also possible the developer knows the data will not be used anyway. This is only a problem if the developer actually intended to bind a Render Target View here. [ EXECUTION WARNING #3146081: DEVICE_DRAW_RENDERTARGETVIEW_NOT_SET]
Related
SUMMARY
It seems that vsync with OpenGL is broken on Windows in windowed mode. I've tried different APIs (SDL, glfw, SFML), all with the same result: While the framerate is limited (and consistently around 16-17 ms according to CPU measurements on multiple 60 Hz setups I've tried), and the CPU is in fact sleeping most of the time, frames are very often skipped. Depending on the machine and the CPU usage for things other than rendering, this can be as bad as effectively cutting the frame rate in half. This problem does not seem to be driver related.
How to have working vsync on Windows with OpenGL in windowed mode, or a similar effect with these properties (if I forgot something notable, or if something is not sensible, please comment):
CPU can sleep most of the time
No tearing
No skipped frames (under the assumption that the system is not overloaded)
CPU gets to know when a frame has actually been displayed
DETAILS / SOME RESEARCH
When I googled opengl vsync stutter or opengl vsync frame drop or similar queries, I found that many people are having this issue (or a very similar one), yet there seems to be no coherent solution to the actual problem (many inadequately answered questions on the gamedev stackexchange, too; also many low-effort forums posts).
To summarize my research: It seems that the compositing window manager (DWM) used in newer versions of Windows forces triple buffering, and that interferes with vsync. People suggest disabling DWM, not using vsync, or going fullscreen, all of which are not a solution to the original problem (FOOTNOTE1). I have also not found a detailed explanation why triple buffering causes this issue with vsync, or why it is technologically not possible to solve the problem.
However: I've also tested that this does not occur on Linux, even on VERY weak PCs. Therefore it must be technically possible (at least in general) for OpenGL-based hardware acceleration to have functional vsync enabled without skipping frames.
Also, this is not a problem when using D3D instead of OpenGL on Windows (with vsync enabled). Therefore it must be technically possible to have working vsync on Windows (I have tried new, old, and very old drivers and different (old and new) hardware, although all the hardware setups I have available are Intel + NVidia, so I don't know what happens with AMD/ATI).
And lastly, there surely must be software for Windows, be it games, multimedia applications, creative production, 3D modeling/rendering programs or whatever, that use OpenGL and work properly in windowed mode while still rendering accurately, without busy-waiting on the CPU, and without frame drops.
I've noticed that, when having a traditional rendering loop like
while (true)
{
poll_all_events_in_event_queue();
process_things();
render();
}
The amount of work the CPU has to do in that loop affects the behavior of the stuttering. However, this is most definitely not an issue of the CPU being overloaded, as the problem also occurs in one of the most simple programs one could write (see below), and on a very powerful system that does nothing else (the program being nothing other than clearing the window with a different color on each frame, and then displaying it).
I've also noticed that it never seems to get worse than skipping every other frame (i.e., in my tests, the visible framerate was always somewhere between 30 and 60 on a 60 Hz system). You can observe somewhat of a Nyquist sampling theorem violation when running program that changes the background color between 2 colors on odd and even frames, which makes me believe that something is not synchronized properly (i.e. a software bug in Windows or its OpenGL implementation). Again, the framerate as far as the CPU is concerned is rock solid. Also, timeBeginPeriod has had no noticeable effect in my tests.
(FOOTNOTE1) It should be noted though that, because of the DWM, tearing does not occur in windowed mode (which is one of the two main reasons to use vsync, the other reason being making the CPU sleep for the maximum amount of time possible without missing a frame). So it would be acceptable for me to have a solution that implements vsync in the application layer.
However, the only way I see that being possible is there is a way to explicitly (and accurately) wait for a page flip to occur (with possibility of timeout or cancellation), or to query a non-sticky flag that is set when the page is flipped (in a way that doesn't force flushing the entire asynchronous render pipeline, like for example glGetError does), and I have not found a way to do either.
Here is some code to get a quick example running that demonstrates this problem (using SFML, which I found to be the least painful to get to work).
You should see homogenous flashing. If you ever see the same color (black or purple) for more than one frame, it's bad.
(This flashes the screen with the display's refresh rate, so maybe epilepsy warning):
// g++ TEST_TEST_TEST.cpp -lsfml-system -lsfml-window -lsfml-graphics -lGL
#include <SFML/System.hpp>
#include <SFML/Window.hpp>
#include <SFML/Graphics.hpp>
#include <SFML/OpenGL.hpp>
#include <iostream>
int main()
{
// create the window
sf::RenderWindow window(sf::VideoMode(800, 600), "OpenGL");
window.setVerticalSyncEnabled(true);
// activate the window
window.setActive(true);
int frame_counter = 0;
sf::RectangleShape rect;
rect.setSize(sf::Vector2f(10, 10));
sf::Clock clock;
while (true)
{
// handle events
sf::Event event;
while (window.pollEvent(event))
{
if (event.type == sf::Event::Closed)
{
return 0;
}
}
++frame_counter;
if (frame_counter & 1)
{
glClearColor(0, 0, 0, 1);
}
else
{
glClearColor(60.0/255.0, 50.0/255.0, 75.0/255.0, 1);
}
// clear the buffers
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// Enable this to display a column of rectangles on each frame
// All colors (and positions) should pop up the same amount
// This shows that apparently, 1 frame is skipped at most
#if 0
int fc_mod = frame_counter % 8;
int color_mod = fc_mod % 4;
for (int i = 0; i < 30; ++i)
{
rect.setPosition(fc_mod * 20 + 10, i * 20 + 10);
rect.setFillColor(
sf::Color(
(color_mod == 0 || color_mod == 3) ? 255 : 0,
(color_mod == 0 || color_mod == 2) ? 255 : 0,
(color_mod == 1) ? 155 : 0,
255
)
);
window.draw(rect);
}
#endif
int elapsed_ms = clock.restart().asMilliseconds();
// NOTE: These numbers are only valid for 60 Hz displays
if (elapsed_ms > 17 || elapsed_ms < 15)
{
// Ideally you should NEVER see this message, but it does tend to stutter a bit for a second or so upon program startup - doesn't matter as long as it stops eventually
std::cout << elapsed_ms << std::endl;
}
// end the current frame (internally swaps the front and back buffers)
window.display();
}
return 0;
}
System info:
Verified this problem on these systems:
Windows 10 x64 i7-4790K + GeForce 970 (verified that problem does not occur on Linux here) (single 60 Hz monitor)
Windows 7 x64 i5-2320 + GeForce 560 (single 60 Hz monitor)
Windows 10 x64 Intel Core2 Duo T6400 + GeForce 9600M GT (verified that problem does not occur on Linux here) (single 60 Hz laptop display)
And 2 other people using Windows 10 x64 and 7 x64 respectively, both "beefy gaming rigs", can request specs if necessary
UPDATE 20170815
Some additional testing I've done:
I tried adding explicit sleeps (via the SFML library, which basically just calls Sleep from the Windows API while ensuring that timeBeginPeriod is minimal).
With my 60 Hz setup, A frame should ideally be 16 2/3 Hz. According to QueryPerformanceCounter measurements, my system is, most of the time, very accurate with those sleeps.
Adding a sleep of 17 ms causes me to render slower than the refresh rate. When I do this, some frames are displayed twice (this is expected), but NO frames are dropped, ever. The same is true for even longer sleeps.
Adding a sleep of 16 ms sometimes causes a frame to be displayed twice, and sometimes causes a frame to be dropped. This is plausible in my opinion, considering a more or less random combination of the result at 17 ms, and the result at no sleep at all.
Adding a sleep of 15 ms behaves very similarly to having no sleep at all. It's fine for a short moment, then about every 2nd frame is dropped. The same is true for all values from 1 ms to 15 ms.
This reinforced my theory that the problem might be nothing other than some plain old concurrency bug in the vsync logic in the OpenGL implementation or the operating system.
I also did more tests on Linux. I hadn't really looked much into it before - I merely verified that the frame drop problem didn't exist there and that the CPU was, in fact, sleeping most of the time. I realised that, depending on several factors, I can make tearing consistenly occur on my test machine, despite vsync. As of yet, I do not know whether this that issue is connected to the original problem, or if it is something entirely different.
It seems like the better approach would be some gnarly workarounds and hacks, and ditching vsync altogether and implementing everything in the application (because apparently in 2017 we can't get the most basic frame rendering right with OpenGL).
UPDATE 20170816
I have tried to "reverse-engineer" a bunch of open source 3D engines (got hung up on obbg (https://github.com/nothings/obbg) in particular).
First, I checked that the problem does not occur there. The frame rate is butter smooth. Then, I added my good old flashing purple/black with the colored rects and saw that the stuttering was indeed minimal.
I started ripping out the guts of the program until I ended up with a simple program like mine. I found that there is some code in obbg's rendering loop that, when removed, causes heavy stutter (namely, rendering the main part of the obbg ingame world). Also, there is some code in the initialization that also causes stutter when removed (namely, enabling multisampling). After a few hours of fiddling around it seems that OpenGL needs a certain amount of workload to function properly, but I have yet to find out what exactly needs to be done. Maybe rendering a million random triangles or something will do.
I also reaslised that all my existing tests behave slightly differently today. It seems that I have overall fewer, but more randomly distributed frame drops today than the days before.
I also created a better demo project that uses OpenGL more directly, and since obbg used SDL, I also switched to that (although I briefly looked over the library implementations and it would surprise me if there was a difference, but then again this entire ordeal is a surprise anyway). I wanted to approach the "working" state from both the obbg-based side, and the blank project side so I can be really sure what the problem is. I just put all the required SDL binaries inside the project; if as long as you have Visual Studio 2017 there should be no additional dependencies and it should build right away. There are many #ifs that control what is being tested.
https://github.com/bplu4t2f/sdl_test
During the creation of that thing I also took another look how SDL's D3D implementation behaves. I had tested this previously, but perhaps not quite extensively enough. There were still no duplicate frames and no frame drops at all, which is good, but in this test program I implemented a more accurate clock.
To my surprise I realised that, when using D3D instead of OpenGL, many (but not the majority) loop iterations take somewhere between 17.0 and 17.2 ms (I would not have caught that with my previous test programs). This does not happen with OpenGL. The OpenGL rendering loop is consistently in the range 15.0 .. 17.0. If it is true that sometimes there needs to be a slightly longer waiting period for the vertical blank (for whatever reason), then OpenGL seems to miss that. That might be the root cause of the entire thing?
Yet another day of literally staring at a flickering computer screen. I have to say I really did not expect to spend that amount of time on rendering nothing but a flickering background and I'm not particularly fond of that.
I'm rendering a top-down, tile-based world, using opengl 3.3, using fully streamed VBO's.
After encountering some lag I did some benchmarking and what I found was horrid!
Let me explain the picture. The first marked square is me running my game using the simplest of shaders. There is no lightning, no nothing! I'm simply uploading 5000 vertices and draw them. My memory load is about 20-30%, cpu-load 30-40%
The second is with lightning. Every light is uploaded as an array to the fragment shader and every fragment processes the lights. load about 40-50%. 100% with 60 lights.
The third is with deferred shading. First I draw normal and diffuse to a FBO, then I render each light to the default FB, while reading from these. load is about 80%. Basically unaffected by amount of lights.
These are the scenes I render:
As you can see, there's nothing fancy. It's retro style. My plan has been to add tons of complexity and still run smooth on low-end computers. Mine is a i7 nvidia 660M, so it shouldn't have a problem.
For comparison I ran warcraft 3 and it took about 50-60% load, 20% memory.
One strange thing I've noticed is that if I disable V-sync and don't call glFinish before swapbuffers, load goes down significantly. However, the clock goes up and heat is produced (53C*).
Now, first I'm wondering if you think this is normal. If not, then what could be my bottleneck? Could it be my streaming VBO? I've tried double buffering and orphaning, but nothing. Doubling the number of sprites basically increases the memory load by 5-10%. the gpu-load remains basically the same.
I'm aware this question can't be easily answered, but I'll provide more details as you require them. Don't want to post my 20000 lines of code here.
Oh, and one more thing... It fluctuates. The draw calls are identical, but the load can go from 2-100%, whenever it feels like it.
UPDATE:
my main loop looks like this:
swapbuffers
renderAndDoGlCalls
updateGameAndPoll
sleep if there's any time left (1/60th second)
repeat.
Without v-sync, glflush or glfinsih, this results in percentage used:
swap: 0.16934400677376027
ren: 0.9929640397185616
upp:0.007698000307920012
poll:0.0615780024631201
sleep: 100.39487801579511
With glFinish prior to swapbuffers:
swap: 26.609977064399082 (this usually goes up to 80%)
ren: 1.231584049263362
upp:0.010266000410640016
poll:0.07697400307896013
sleep: 74.01582296063292
with Vsync it starts well, usually the same as with glFinish, then bam!:
swap: 197.84934791397393
ren: 1.221324048852962
upp:0.007698000307920012
poll:0.05644800225792009
sleep: 0.002562000102480004
And it stays that way.
Let me clarify... If I call swapbuffers right after all opengl calls, my CPU stalls for 70% of update-time, letting me do nothing. This way, I give the GPU the longest possible time to finish the backbuffer before I call the swap again.
You are actually inadvertently causing the opposite scenario.
The only time SwapBuffers causes the calling thread to stall is when the pre-rendered frame queue is full and it has to wait for VSYNC to flush a finished frame. The CPU could easily be a good 2-3 frames ahead of the GPU at any given moment, and it is not the current frame finishing that causes waiting (there's already a finished frame that needs to be swapped in this scenario).
Waiting happens because the driver cannot swap the backbuffer from back to front until the VBLANK signal rolls around (which only occurs once every 16.667ms). The driver will actually continue to accept commands while it is waiting for a swap up until it hits a certain limit (pre-rendered frames on NVIDIA hardware / flip queue size on AMD) worth of queued swaps. Once that limit is hit, GL commands will cause blocking until the back buffer(s) is/are swapped.
You are sleeping at the end of your frames, so no appreciable CPU/GPU parallelism ever develops; in fact you are more likely to skip a frame this way.
That is what you are seeing here. The absolute worst-case scenario is when you sleep for 1 ms too late to swap buffers in time for VBLANK. Your time between two frames then becomes 16.66667 + 15.66667 = 32.33332 ms. This causes a stutter that would not have happened if you did not add your own wait time. The driver could have easily copied the backbuffer from back to front and continued accepting commands in that 1 extra ms you added, but instead it blocks for an additional 15 at the beginning of the next frame.
To avoid this, you want to swap buffers as soon as possible after all commands for a frame have been issued. You have the best likelihood of meeting the VBLANK deadline this way. Reported CPU usage may go up since less time is spent sleeping, but performance should be measured using frame time rather than scheduled CPU time.
VSYNC and the pre-rendered frame limit discussed will keep your CPU and GPU from running out of control and generating huge amounts of heat as mentioned in the question.
I'm getting some repeating lags in my opengl application.
I'm using the win32 api to create the window and I'm also creating a 2.2 context.
So the main loop of the program is very simple:
Clearing the color buffer
Drawing a triangle
Swapping the buffers.
The triangle is rotating, that's the way I can see the lag.
Also my frame time isn't smooth which may be the problem.
But I'm very very sure the delta time calculation is correct because I've tried plenty ways.
Do you think it could be a graphic driver problem?
Because a friend of mine run almost the exactly same program except I do less calculations + I'm using the standard opengl shader.
Also, His program use more CPU power than mine and the CPU % is smoother than mine.
I should also add:
On my laptop I get same lag every ~1 second, so I can see some kind of pattern.
There are many reasons for a jittery frame rate. Off the top of my head:
Not calling glFlush() at the end of each frame
other running software interfering
doing things in your code that certain graphics drivers don't like
bugs in graphics drivers
Using the standard windows time functions with their terrible resolution
Try these:
kill as many running programs as you can get away with. Use the process tab in the task manager (CTRL-SHIFT-ESC) for this.
bit by bit, reduce the amount of work your program is doing and see how that affects the frame rate and the smoothness of the display.
if you can, try enabling/disabling vertical sync (you may be able to do this in your graphic card's settings) to see if that helps
add some debug code to output the time taken to draw each frame, and see if there are anomalies in the numbers, e.g. every 20th frame taking an extra 20ms, or random frames taking 100ms.
In an MFC-program I built myself I have some weird problems with the CPU usage.
I load a point cloud of around 360k points and everything works fine (I use VBO buffers which is the way to do it from what I understand?). I can move it around as I please and notice no adverse effects (CPU usage is very low, GPU does all the work). But then at certain angles and zoom values I see the CPU spike on one of my processors! I can then change the angle or zoom a little and it will go down to around 0 again. This is more likely to happen in a large window than a small one.
I measure the FPS of the program and it's constantly at 65, but when the CPU spike hits it typically goes down around 10 units to 55. I also measure the time SwapBuffers take and during normal operation it's around 0-1 ms. Once the CPU spike hits it goes up to around 20 ms, so it's clear something suddenly gets very hard to calculate in that function (for the GPU I guess?). This something is not in the DrawScene function (which is the function one would expect to eat CPU in a poor implementation), so I'm at a bit of a loss.
I know it's not due to the number of points visible because this can just as easily happen on just a sub-section of the data as on the whole cloud. I've tried to move it around and see if it's related to the depth buffer, clipping or similar but it seems entirely random what angles create the problem. It does seem somewhat repeatable though; moving the model to a position that was laggy once will be laggy when moved there again.
I'm very new at OpenGL so it's not impossible I've made some totally obvious error.
This is what the render loop looks like (it's run in an MFC app via a timer event with 1 ms period):
// Clear color and depth buffer bits
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// Draw OpenGL scene
OGLDrawScene();
unsigned int time1 = timeGetTime();
// Swap buffers
SwapBuffers(hdc);
// Calculate execution time for SwapBuffers
m_time = timeGetTime() - time1;
// Calculate FPS
++m_cnt;
if (timeGetTime() - m_lastTime > 1000)
{
m_fps = m_cnt;
m_cnt = 0;
m_lastTime = timeGetTime();
}
I've noticed that (at least a while back), ATI drivers tend to like to spin-wait a little too aggressively, while NVidia drivers tend to be interrupt driven. Technically spin-waiting is faster, assuming you have nothing better to do. Unfortunately, today you probably do have something better to do on another thread.
I think the OP's display drivers may indeed be spin-waiting.
Ok, so I think I have figured out how this can happen.
To begin with the WM_TIMER messages doesn't seem to be generated more often than every 15 ms at best, even when using timeBeginPeriod(1), at least on my computer. This resulted in the standard 65 fps I was seeing.
As soon as the scene took more than 15 ms to render, SwapBuffers would be the limiting factor instead. SwapBuffers seem to busy-wait, which resulted in 100% CPU usage on one core when this happened. This is not something that occurred at certain camera angles, but is a result of a fluidly changing fps depending on how many points were shown on the screen at the time. It just appeared to spike whenever rendering happened to hit a time over 15 ms and started to wait at SwapBuffers instead.
On a similar note, does anyone know of a function like "glReadyToSwap" or something like that? A function that indicates if the buffers are ready to be swapped? That way I could choose another method for rendering with a higher resolution (1 ms for example) and then each ms check if the buffers are ready to swap, if they aren't just wait another ms, so as to not busy-wait.
I have 200 frames to be displayed per second. The frames are very very simple, black and white, just a couple of lines. A timer is driving the animation. The goal is to play back the frames at approx. 200 fps.
Under Linux I set the timer to 5 ms and I let it display every frame (that is 200fps). Works just fine but it fails under Win 7.
Under Win 7 (the same machine) I had to set the timer to 20 ms and let it display every 4 frame (50 fps × 4 = 200). I found these magic numbers by trial and error.
What should I do to guarantee (within reasonable limits) that the animation will be played back at a proper speed on the user's machine?
For example, what if the user's machine can only do 30 fps or 60 fps?
The short answer is, you can't (in general).
For best aesthetics, most windowing systems have "vsync" on by default, meaning that screen redraws happen at the refresh rate of the monitor. In the old CRT days, you might be able to get 75-90 Hz with a high-end monitor, but with today's LCDs you're likely stuck at 60 fps.
That said, there are OpenGL extensions that can disable VSync (don't remember the extension name off hand) programmatically, and you can frequently disable it at the driver level. However, no matter what you do (barring custom hardware), you're not going to be able to display complete frames at 200 fps.
Now, it's not clear if you've got pre-rendered images that you need to display at 200 fps, or if you're rendering from scratch and hoping to achieve 200 fps. If it's the former, a good option might be to use a timer to determine which frame you should display (at each 60 Hz. update), and use that value to linearly interpolate between two of the pre-rendered frames. If it's the latter, I'd just use the timer to control motion (or whatever is dynamic in your scene) and render the appropriate scene given the time. Faster hardware or disabled VSYNC will give you more frames (hence smoother animation, modulo the tearing) in the same amount of time, etc. But the scene will unfold at the right pace either way.
Hope this is helpful. We might be able to give you better advice if you give a little more info on your application and where the 200 fps requirement originates.
I've already read, that you've data sampled at 200Hz, which you want to play back at natural speed. I.e. one second of sampled data shall be rendered over one second time.
First: Forget about using timers to coordinate your rendering, this is unlikely to work properly. Instead you should measure the time a full rendering cycle (including v-sync) takes and advance the animation time-counter by this. Now 200Hz is already some very good time resolution, so if the data is smooth enough, then there should be no need to interpolate at all. So something like this (Pseudocode):
objects[] # the objects, animated by the animation
animation[] # steps of the animation, sampled at 200Hz
ANIMATION_RATE = 1./200. # Of course this shouldn't be hardcoded,
# but loaded with the animation data
animationStep = 0
timeLastFrame = None
drawGL():
timeNow = now() # time in seconds with (at least) ms-accuracy
if timeLastFrame:
stepTime = timeNow - timeLastFrame
else:
stepTime = 0
animationStep = round(animationStep + stepTime * ANIMATION_RATE)
drawObjects(objects, animation[animationStep])
timeLastFrame = timeNow
It may be, that your rendering is much faster than the time between screen refreshs. In that case you may want to render some of the intermediate steps, too, to get some kind of motion blur effect (you can also use the animation data to obtain motion vectors, which can be used in a shader to create a vector blur effect), the render loop would then look like this:
drawGL():
timeNow = now() # time in seconds with (at least) ms-accuracy
if timeLastFrame:
stepTime = timeNow - timeLastFrame
else:
stepTime = 0
timeRenderStart = now()
animationStep = round(animationStep + stepTime * ANIMATION_RATE)
drawObjects(objects, animation[animationStep])
glFinish() # don't call SwapBuffers
timeRender = now() - timeRenderStart
setup_GL_for_motion_blur()
intermediates = floor(stepTime / timeRender) - 1 # subtract one to get some margin
backstep = ANIMATION_RATE * (stepTime / intermediates)
if intermediates > 0:
for i in 0 to intermediates:
drawObjects(objects, animation[animationStep - i * backstep])
timeLastFrame = timeNow
One way is to sleep for 1ms at each iteration of your loop and check how much time has passed.
If more than the target amount of time has passed (for 200fps that is 1000/200 = 5ms), then draw a frame. Else, continue to the next iteration of the loop.
E.g. some pseudo-code:
target_time = 1000/200; // 200fps => 5ms target time.
timer = new timer(); // Define a timer by whatever method is permitted in your
// implementation.
while(){
if(timer.elapsed_time < target_time){
sleep(1);
continue;
}
timer.reset(); // Reset your timer to begin counting again.
do_your_draw_operations_here(); // Do some drawing.
}
This method has the advantage that if the user's machine is not capable of 200fps, you will still draw as fast as possible, and sleep will never be called.
There are probably two totally independant factors to consider here:
How fast is the users machine? It could be that you are not achieving your target frame rate due to the fact that the machine is still processing the last frame by the time it is ready to start drawing the next frame.
What is the resolution of the timers you are using? My impression (although I have no evidence to back this up) is that timers under Windows operating systems provide far poorer resolution than those under Linux. So you might be requesting a sleep of (for example) 5 mS, and getting a sleep of 15 mS instead.
Further testing should help you figure out which of these two scenarios is more pertinent to your situation.
If the problem is a lack of processing power, you can choose to display intermediate frames (as you are doing now), or degrade the visuals (lower quality, lower resolution, or anything else that might help speed thigns up).
If the problem is timer resolution, you can look at alternative timer APIs (Windows API provides two different timer functionc alls, each with different resolutions, perhaps you are using the wrong one), or try and compensate by asking for smaller time slices (as in Kdoto's suggestion). However, doing this may actually degrade performance, since you're now doing a lot more processing than you were before - you may notice your CPU usage spike under this method.
Edit:
As Drew Hall mentions in his answer, there's another whole site to this: The refresh rate you get in code may be very different to the actual refresh rate appearing on screen. However, that's output device dependent, and it sounds from your question like the issue is in code, rather than in output hardware.