I'd like to get more performance out of BitBlt for capturing the screen. When Aero remains enabled, capturing only a 400x400 pixel area of the screen reduces capture time from an average of 50ms (full 1920x1200) to about 33ms (for 400x400) on my machine. This is a disappointingly low improvement.
Is my only option disabling Aero? I do know that i can get a blinding fast 3ms capture on the full screen when Aero is disabled.
A screen capture in Aero mode is very expensive because it requires synchronization with the DWM (think about waiting for everyone not blink their eyes to take a group photo), and locking the entire GPU pipe on Vista. On Windows 7 GDI performance is improved by accelerate common GDI operations and reduce GDI locking)
There are attempts of either using undocumented DWM functions or hooks to get the shared surface used by DWM. But you still need to pay the price to move data from video memory to system RAM via the sometimes slow FSB if you plan to process the image data (e.g. send to a network or save to file).
disable aero for now I guess. I can't seem to get really above 15 fps out of it, presumably because it's locking on 30 fps boundaries, and I can't "get in" to get a frame until after the next boundary starts, and then have to wait for the whole time to elapse.
Related
I am working on a graphical application that supports multiple operating systems and graphical back ends. The window is created with GLFW and the graphics API is chosen at runtime. When running the program on windows and using OpenGL, Vsync seems to be broken. The frame rate is locked at 60 fps, however, screen tearing artifacts appear. Following GLFW documentation, glfwSwapInterval(0); should unlock the frame rate from the default of using VSync. That works as expected. Using glfwSwapInterval(1); should lock the frame rate to match the monitors refresh rate. Not calling glfwSwapInterval(); at all should default to using VSync. While frame rate is correcly locked / unlocked using these calls, I experienced extremely interesting behaviours.
When glfwSwapInterval(); isn't called at all, VSync is set as default. But the wait for the next frame happens at the first draw call! One would think that the delay for the next frame would happen at glfwSwapBuffers(). No screen artifacts are visible what so ever.
When calling glfwSwapInterval(1);, Vsync is set and the delay for the next frame happens at glfwSwapBuffers()! That's great, however, when explicitly setting VSync, screen tearing artifacts appear.
Right now, not calling glfwSwapInterval() for using VSync seems to be a hacky solution, but :
The user wouldn't be able to disable VSync without window reconstruction,
The profiler identifies the first draw call taking way too long, as VSync wait time is somehow happening there.
I have tried fiddling with GPU driver settings and testing the code on multiple machines. The problem is persistent across machines if using windows and OpenGL.
If anyone can make any sense of this, please share, or if I am misunderstanding something, I would greatly appreciate some pointers in the right direction.
EDIT:
Some other detail: the tearing happens at a specific horizontal line. The rest of the frame seems to work properly.
After doing some more tests, it seems that everything is working as intended on integrated graphics. Correct me if I am wrong, but it looks like it is a graphics driver issue.
I am working on Ogre application that I set real time views as a background in my window. Hovewer I have question when I try to get my application's frame rate by using RenderTarget::getAverageFPS() and then I get 19.7433. Is this right frame rate ?
and how can I change this frame rate for example to 30fps or 40fps ?
Unless your application is locked to the screen's vsync, you can't just change your framerate. You have to optimize your rendering, so that you can render within whatever framerate you desire. Or alternatively, render less stuff.
So if you want to render a frame 30 times a second, your rendering (and everything else) must happen within 1/30th of a second.
In short: Ogre is probably not directly the cause of your framerate. What you're telling Ogre to do is.
Note that you should be checking this in an optimized, release build, not debug. Debug builds are slow (because they're for debugging).
I created a widget that serves as some kind of popup window und hence should have a drop shadow all around to optically raise it from the background. I initialize the drop shadow effect in the constructor of my popup widget as follows:
dropshadow = new QGraphicsDropShadowEffect(this);
dropshadow->setBlurRadius(32);
dropshadow->setColor(QColor("#121212"));
dropshadow->setOffset(0,0);
setGraphicsEffect(dropshadow);
The application runs on an embedded system with an Intel Atom CPU, a custom Linux distribution, Qt v4.7.3 running with a qws server. When I disable the drop shadow, my cpu usage is less than 10% when the GUI is idle. Enabling the drop shadow raises the cpu usage to more than 80%. Profiling the app shows that most of the CPU time is spent within libQtGui.so.4.7.3.
Does anyone have an idea why the cpu usage explodes like this even though there is absolutely nothing going on in the GUI, not even mouse movement?
Edit: Changing the size of the popup changes the amount of cpu usage. Reducing the size to a quarter reduces the cpu usage to about a quarter. Very strange.
The problem was only partly with the drop shadow. It seems that repainting a drop shadow requires quite a lot of CPU time - which is OK if it is not redrawn too often. The problem was simple really. The widget that was behind this popup was redrawn four to five times per second and hence, the popup needed to be redrawn, too. This swallowed huge amounts of CPU time. The solution is equally simple: Avoid repaint events if nothing really changes on screen.
I'm currently writing a game of immense sophistication and cunning, that will fill you with awe and won- oh, OK, it's the 15 puzzle, and I'm just familiarising myself with SDL.
I'm running in windowed mode, and using SDL_Flip as the general-case page update, since it maps automatically to an SDL_UpdateRect of the full window in windowed mode. Not the optimum approach, but given that this is just the 15 puzzle...
Anyway, the tile moves are happening at ludicrous speed. IOW, SDL_Flip in windowed mode doesn't include any synchronisation with vertical retraces. I'm working in Windows XP ATM, but I assume this is correct behaviour for SDL and will occur on other platforms too.
Switching to using SDL_UpdateRect obviously won't change anything. Presumably, I need to implement the delay logic in my own code. But a simple clock-based timer could result in updates occuring when the window is half-drawn, causing visible distortions (I forget the technical name).
EDIT This problem is known as "tearing".
So - in a windowed mode game in SDL, how do I synchronise my page-flips with the vertical retrace?
EDIT I have seen several claims, while searching for a solution, that it is impossible to synchronise page-flips to the vertical retrace in a windowed application. On Windows, at least, this is simply false - I have written games (by which I mean things on a similar level to the 15-puzzle) that do this. I once wasted some time playing with Dark Basic and the Dark GDK - both DirectX-based and both syncronising page-flips to the vertical retrace in windowed mode.
Major Edit
It turns out I should have spent more time looking before asking. From the SDL FAQ...
http://sdl.beuc.net/sdl.wiki/FAQ_Double_Buffering_is_Tearing
That seems to imply quite strongly that synchronising with the vertical retrace isn't supported in SDL windowed-mode apps.
But...
The basic technique is possible on Windows, and I'm beginning the think SDL does it, in a sense. Just not quite certain yet.
On Windows, I said before, synchronising page-flips to vertical syncs in Windowed mode has been possible all the way back to the 16-bit days using WinG. It turns out that that's not exactly wrong, but misleading. I dug out some old source code using WinG, and there was a timer triggering the page-blits. WinG will run at ludicrous speed, just as I was surprised by SDL doing - the blit-to-screen page-flip operations don't wait for a vertical retrace.
On further investigation - when you do a blit to the screen in WinG, the blit is queued for later and the call exits. The blit is executed at the next vertical retrace, so hopefully no tearing. If you do further blits to the screen (dirty rectangles) before that retrace, they are combined. If you do loads of full-screen blits before the vertical retrace, you are rendering frames that are never displayed.
This blit-to-screen in WinG is obviously similar to the SDL_UpdateRect. SDL_UpdateRects is just an optimised way to manually combine some dirty rectangles (and be sure, perhaps, they are applied to the same frame). So maybe (on platforms where vertical retrace stuff is possible) it is being done in SDL, similarly to in WinG - no waiting, but no tearing either.
Well, I tested using a timer to trigger the frame updates, and the result (on Windows XP) is uncertain. I could get very slight and occasional tearing on my ancient laptop, but that may be no fault of SDLs - it could be that the "raster" is outrunning the blit. This is probably my fault for using SDL_Flip instead of a direct call to SDL_UpdateRect with a minimal dirty rectangle - though I was trying to get tearing in this case, to see if I could.
So I'm still uncertain, but it may be that windowed-mode SDL is as immune to tearing as it can be on those platforms that allow it. Results don't seem as bad as I imagined, even on my ancient laptop.
But - can anyone offer a definitive answer?
You can use the framerate control of SDL_gfx.
Looking at the docs of library, the flow of your application will be like this:
// initialization code
FPSManager *fpsManager;
SDL_initFramerate(fpsManager);
SDL_setFramerate(fpsManager, 60 /* desired FPS */);
// in the render loop
SDL_framerateDelay(fpsManager);
Also, you may look at the source code to create your own framerate control.
I noticed in an MFC application I'm developing that while dragging the scroll bar to smoothly scroll down the document, the framerate drops to choppy levels when a block containing about a paragraph of text is on screen, but silky smooth when it's offscreen. Investigating the performance, I found the single CDC::DrawText call for the paragraph of text responsible. This is in an optimised release build.
I used QueryPerformanceCounter to get a high-resolution measurement of just the DrawText call, like this:
QueryPerformanceCounter(...);
pDC->DrawText(some_cstring, some_crect, DT_WORDBREAK);
QueryPerformanceCounter(...);
The text is unicode, lorem-ipsum style filler, 865 characters long and wraps over 7-and-a-bit lines given the rectangle and font (Segoe UI, lfHeight = -12, a standard body text size). From my measurements, that call alone takes on average 7.5 ms, with the odd peak at 21ms. (Note to keep up with a 60Hz monitor you get about 16ms to render each update.)
I tried making some changes to improve the performance:
Removing the DT_WORDBREAK improves performance to about 1ms (about 7 times faster), but given only one line of text is making it to the screen, and there were just over 7 lines with word breaking, this seems to suggest to me the bottleneck is elsewhere.
I was drawing text in transparent mode (SetBkMode(TRANSPARENT)). So I tried opaque mode with a solid background fill. No improvement.
I thought ClearType rendering might be to blame. I changed the font lfQuality from CLEARTYPE_QUALITY to NONANTIALIASED_QUALITY. It looked like crap with sharp edges and all, and no improvement.
As per a comment suggestion, I was using a CMemDC, but I got rid of it and did direct drawing. It flickered like mad, and no improvement.
This is running on a Windows 7 64-bit laptop with an Intel Core 2 Duo P8400 # 2.26 GHz and 4 GB RAM - I don't think it counts as a slow system.
I'm calling DrawText() every time it draws and this obviously hammers the performance with such a slow function, especially if several of those text-blocks are visible at once. It's enough to make the experience feel sluggish. However, Firefox can render a page like this one in ClearType with much more text, and seems to cope just fine. What am I doing wrong? How can I get around the poor performance of an actual DrawText call?
Drawing the text at every refresh is wasteful. Use double buffering, that is, draw in an offscreen bitmap and just blit it to the screen. Then, for scrolling, just copy most of the bitmap up or down or sideways as necessary, then draw only the invalidated area (before blitting the result to the screen).
If even that turns out to be too slow, keep also the drawn text in an off-screen bitmap, and blit instead of draw.
Cheers & hth.,
According to this german blogpost, the issue has to do with support for asian language fonts. If you enable those in XP you get the same perf hit. In Vista/7, they are default enabled and you can't turn them off.
EDIT: Just maybe, using a different font might help.. (one that does not contain asian characters).
Users can't read text at 7 lines in 7 milliseconds, so the call itself is fast enough.
The 60 Hz refresh rate of the monitor is entirely irrelevant. You don't need to re-render the same text for every frame. The videocard will happily send the same pixels to the screen again.
So, I thibk you have another problem. Are you perhaps wondering about scrolling text? Please ask about the problem you really have, instead of assuming DrawText is the culprit.
In order to break the text on word breaks, DrawText needs to repeatedly try to get the width of a block of text to see if it will fit, then take the remainder and do it over. It will need to do this at every call. If your text is unchanging, this is an unnecessary overhead. As a workaround, you could measure the text yourself and insert temporary line breaks and remove the DT_WORDBREAK flag.
Have you considered Direct2D/DirectWrite?
Anyway it should work better if you just draw the text once to its own mem dc and blit that over to whatever dc you want it painted on with each iteration.