I'm having some unexpected performance issues with my haxeflixel game when building a windows (cpp) target with the following settings
<window if="cpp" width="480" height="270" fps="60" background="#000000"
hardware="false" vsync="true" />
I notice that when I'm re-sizing the window to bigger resolutions, or going full-screen up to 1920x1080p, the game becomes slower and lagging. However according to the flixel debug console, the frame rate is the same for all the resolutions.
Something even more interesting is that my flash export runs much more fluid, while I expected the cpp target to run faster.
It's a 2d platform game with about 6 tilemaps (The biggest tilemap is 1600x1440) and 32x32 or 16x16 sprites. I did not expect to have performance issues on any modern system. So I'm concerned that I'm doing something wrong like missing an obvious setting.
Is this normal? Are there any key rendering performance factors I should check? Please fell free to ask me for any details if you think this would help.
using Haxeflixel 3.3.12
I think this may be a common problem among all the C++ targets. I experienced this with the Linux native target for my game as well. My solution was to disable anti-aliasing via
<window antialiasing="0" />
Of course, this works best with pixel art and not 3D or HD stuff. And then there's still the potential problem of performance dipping at higher resolutions (retina displays and whatnot). But this might be sufficient as a stopgap solution.
I've been testing my app settings with different configurations, when I've finally found out that turning off the vsync option would make the biggest impact. There is some vertical jittering, but the game runs fast finally, and windows target is faster than flash.
Turns out that my current laptop has an IntelHD GPU, and the vsync feature seems to be broken. I remember that my previous PC, equipped with a low end AMD GPU didn't have this issue.
I will consider adding an in-game option to switch vsync, so that non-intel users could benefit from vsync.
Other things that seem to have helped are:
Switching off antialiasing as #Jon O suggested
Turning hardware on
For reference, my current setting is
<window if="cpp" width="960" height="540" fps="60" background="#000000" hardware="true" vsync="false" antialiasing="0" />
Related
How can we benchmark performance of Qt Wayland on a hardware platform?
Do we have any benchmarking tools like "glmark2-es2" which is used for standard OpenGL benchmarking. This is required to see if we can use Qt Wayland compositor or have to use Wayland.
glmark2 works on Wayland as well, however it is not in it's current condition a good measure of actual performance. It tries to render frames as fast as possible, regardless of how fast the compositor is actually able to show them. This means most of the frames are wasted, and never shown on the screen. So what it usually measures is how good the compositor is at ignoring superfluous frames from a misbehaving client (most clients wait for the compositor to tell them to draw a new frame so it can be close to vsync). Actually, a compositor locked at a ridiculously low frame rate can more easily achieve a high glmark2 score than one running at a steady 60fps.
Instead, it's better to use a tool that tries to increase the workload per frame while keeping the frame-rate constant at 60fps.
If you're using Qt anyway, then one such tool is https://github.com/CrimsonAS/qmlbench. You'll probably be able to find others if you want something toolkit independent.
EDIT: If you want more of my rants about why glmark2-es is a horrible tool to benchmark compositors, see http://blog.qt.io/blog/2017/05/31/qt-wayland-summary/#comment-1200024
I just picked up a new Lenovo Thinkpad that comes with Intel HD Graphics 3000. I'm finding that my old freeglut apps, which use GLUT_MULTISAMPLE, are running at 2 or 3 fps as opposed to the expected 60fps. Even the freeglut example 'shapes' runs this slow.
If I disable GLUT_MULTISAMPLE from shapes.c (or my app) things run quickly again.
I tried multisampling on glfw (using GLFW_FSAA - or whatever that hint is called), and I think it's working fine. This was with a different app (glgears). glfw is triggering Norton Internet Security, which things it's malware so keeps removing .exes... but that's another problem... my interest is with freeglut.
I wonder if the algorithm that freeglut uses to choose a pixel format is tripping up on this card, whereas glfw is choosing the right one.
Has anyone else come across something like this? Any ideas?
That glfw triggeres Norton is a bug in Nortons virus definition. If it's still the case with the latest definitions, send them your glfw dll/app so they can fix it. Same happens on Avira and they are working on it (have already confirmed that it's a false positive).
As for the HD3000, that's quite a weak GPU, what resolution is your app and how many samples are you using? Maybe the amount of framebuffer memory gets to high for the little guy?
Nowadays we have pretty advanced tools to iron out rendering, allowing to see the different stages, time taken by draw calls, etc. But without them the graphics pipeline is quite a black box when it comes to understand what is happening inside.
Suppose for some reason you have no such tool, or a very limited one. How would you measure anyway what is taking time in your rendering?
I am aware of tricks like discarding draw calls to see the CPU time, setting a 1x1 viewport to see the cost of geometry, using a dumb fragment shader to highlight the fillrate... They are useful already but only give a rough idea of what is going on, and tell nothing about the level of parallelism.
Also, getting the time spent in each stage per draw call seem to be difficult, especially when taking into account the lack of precision due to the noise when measuring.
What tricks do you use when your backpack is almost empty and you still have to profile your rendering? What is your personal Swiss army knife consisting in?
Frame time rendering time
Absolute time spent for small code/stage/etc. is not that relevant as GPU driver optimization/batching/parallelism/version makes it nearly impossible to have precise code measure without GPU counters. (which you can get if you use with vendors libs)
What you can measure easily is each single code change impact. You'll only get relative impact, and it's what you really need anyway. And that just using frame rendering time.
Ideally you should aim be able can edit shader or pipeline code during runtime, and have a direct way to check impact over a whole typical scene, like just comparing graphs between several code path. (beware of static scenes, otherwise you'll end with highly optimized static views, but poor dynamic scenes performance)
Here's the swiss army knife list:
scene states loader
scene recorder (camera paths/add-remove entities,texture, mesh, fake input, etc.) using scene states.
scene states saver
scene frame time logger (not just final average but each frame rendering time)
on-the-fly shader code reload
on-the-fly codepath switch
frame time log reader+graphs+statistic framework
Note that scene state load/save/record are handy for a lot of other things, from debugging to undo/redo to on-the-fly reload, not to mention savegames.
Add a screenshot taker + image diff, and you can unit test graphic code too.
If you can, add that to your CI server so that huge code impact doesn't go unnoticed. (helps also artists when they check-in their assets, without evaluating rendering impact)
A must read on that related CI graphic test work is there : http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/
Note: I'm responding to the question: "Profiling a graphics rendering with a profiler", since that something I was looking for ;)
I'm working mostly on Mac, and I'm using multiple tools:
gDebugger version 5.8 is available on Windows and Mac (this tool has been bought by AMD, the v6 version is Windows only). It gives you statistics about state changes, texture usage, draw calls, etc. It's also usefull to debug texture mapping, and see how your scene is drawn, step by step.
PVRUniSCoEditor it's a shader editor. It compiles on the fly and give you precious details about estimated cycles and registers usage.
Instruments (from XCode Utilities, OSX only), it gets informations from the OpenGL driver, it's great to find bottleneck since you can track what part of the GPU is used at 100% (tiler, renderer, texture unit, etc...)
Adreno Profiler a Windows tool to profile Adreno-based mobile devices. (Very good tool if you work on Android apps ;))
What's your trick about the "dumb fragment shader to highlight the fillrate" ? (drawing a plain color ? or something more advanced ?)
I have been working on a project of OpenGL. Here I just display a boat moving along with some option's for view change.. Its a 2D program. The thing is I have used many glTranslate functions for moving the boat in the code. It works properly in Windows(DEV-CPP) but when executed in Fedora it has a very very very slow movement for boat. When checked for the CPU LOAD it was huge. So any thing which i can try to move the boat faster?
Please help :)
It's most likely that you don't have hardware acceleration in your Fedora version. Check that you've got the proprietary drivers from nvidia (or whoever manufacturs your video card) installed.
Also, do other OpenGL programs run fast?
It's probably the fact that Intel has a horrible Linux driver.
When you say you have too many glTranslate's, how many is too many? Thousands per frame? If you have a lot of translate calls called back to back, you can always add them up by hand and then call glTranslate once with the sum. I'd be suspicious if that's what's slowing your machine down.
Ive got an application which drops to around 10fps. I profiled it with xperf which showed my app was using just 20% of the CPU, with none of my methods using a larger than expected amount of that 20%.
This seems to indicate that the vast drop in fps is because the graphics card isnt able to keep up with rendering the frame, resulting in my program stopping while it catches up...
Is there some way to profile what the graphics card is up to and work out what my program is telling it to do thats slowing it down, so that I can try to improve the frame rate?
For debugging / profiling graphics, try Nvidia PerfHUD
NVIDIA PerfHUD is a powerful real-time performance analysis tool for Direct3D applications.
There is also an ATI solution, called 'GPU PerfStudio'
GPU PerfStudio is a real-time performance analysis tool which has been designed to help tune the graphics performance of your DirectX 9, DirectX 10, and OpenGL applications. GPU PerfStudio displays real-time API, driver and hardware data which can be visualized using extremely flexible plotting and bar chart mechanisms. The application being profiled maybe executed locally or remotely over the network. GPU PerfStudio allows the developer to override key rendering states in real-time for rapid bottleneck detection. An auto-analysis window can be used for identifying performance issues at various stages of the graphics pipeline. No special drivers or code modifications are needed to use GPU PerfStudio.
You can find more information and download links here:
http://developer.nvidia.com/object/nvperfhud_home.html
http://developer.amd.com/tools-and-sdks/graphics-development/gpu-perfstudio/
Also, check out this article on FPS:
FPS vs Frame Time
Basically it talks about the fact that a drop from 200fps to 190fps is negligible, whereas a drop from 30fps to 20fps is a MUCH bigger deal. For better performance measuring, you should be calculating frame time rather than FPS.
You never told us what your fps is or what the program is doing at all, so your "vast drop" might not be a big deal at all.
For DirectX, there is PIX for profiling the CPU and GPU operations. It can give very detailed info, and might be worth looking into.
Hope that helps!
You can try using dxprof (search in google). It's lightweight app that draws real-time bars, each bar corresponding to one DirectX event (such as draw-call or resource copy). You can freeze the bars and check calls stack to find out where the draw-call originates from.
Are you developing for Windows? If so avoid using Video for Windows as this will limit you in the manner that you describe. Use DirectX instead.
No need to guess. Just pause it a few times under the IDE, and it will show you exactly what it's waiting for.