Graphics Profiling - c++

Ive got an application which drops to around 10fps. I profiled it with xperf which showed my app was using just 20% of the CPU, with none of my methods using a larger than expected amount of that 20%.
This seems to indicate that the vast drop in fps is because the graphics card isnt able to keep up with rendering the frame, resulting in my program stopping while it catches up...
Is there some way to profile what the graphics card is up to and work out what my program is telling it to do thats slowing it down, so that I can try to improve the frame rate?

For debugging / profiling graphics, try Nvidia PerfHUD
NVIDIA PerfHUD is a powerful real-time performance analysis tool for Direct3D applications.
There is also an ATI solution, called 'GPU PerfStudio'
GPU PerfStudio is a real-time performance analysis tool which has been designed to help tune the graphics performance of your DirectX 9, DirectX 10, and OpenGL applications. GPU PerfStudio displays real-time API, driver and hardware data which can be visualized using extremely flexible plotting and bar chart mechanisms. The application being profiled maybe executed locally or remotely over the network. GPU PerfStudio allows the developer to override key rendering states in real-time for rapid bottleneck detection. An auto-analysis window can be used for identifying performance issues at various stages of the graphics pipeline. No special drivers or code modifications are needed to use GPU PerfStudio.
You can find more information and download links here:
http://developer.nvidia.com/object/nvperfhud_home.html
http://developer.amd.com/tools-and-sdks/graphics-development/gpu-perfstudio/

Also, check out this article on FPS:
FPS vs Frame Time
Basically it talks about the fact that a drop from 200fps to 190fps is negligible, whereas a drop from 30fps to 20fps is a MUCH bigger deal. For better performance measuring, you should be calculating frame time rather than FPS.
You never told us what your fps is or what the program is doing at all, so your "vast drop" might not be a big deal at all.
For DirectX, there is PIX for profiling the CPU and GPU operations. It can give very detailed info, and might be worth looking into.
Hope that helps!

You can try using dxprof (search in google). It's lightweight app that draws real-time bars, each bar corresponding to one DirectX event (such as draw-call or resource copy). You can freeze the bars and check calls stack to find out where the draw-call originates from.

Are you developing for Windows? If so avoid using Video for Windows as this will limit you in the manner that you describe. Use DirectX instead.

No need to guess. Just pause it a few times under the IDE, and it will show you exactly what it's waiting for.

Related

How to stream OpenGL rendered scene from the cloud to remote clients

So I have a desktop app, using OpenGL to render large data sets in 3D. I want to move it to the cloud and use server-side rendering in order to stream the rendered images to remote clients (JS, etc.).
From what I understand, WebRTC is the best approach for that. However, it's complicated and expensive to implement, and mainly aimed for video conferencing applications. Are there any frameworks/open source which are more suitable for 3D graphics streaming. Is Nvidia's GameStreaming a suitable technology to explore or is it tailored for games? Any other ideas and approaches?
There are many ideas and approaches, and which one works best depends a lot on your particular application, budget, client, and server.
If you render on the server side, the big advantage is that you control the GPU, the available memory, the OS and driver version, etc so cross-platform or OS version problems largely disappear.
But now you're sending every frame pixel by pixel to the user. (And MPEG-4 isn't great when compressing visualization rather than video.)
And you've got a network latency delay on every keystroke, or mouse click, or mouse movement.
And if tens? hundreds? thousands? of people want to use your app simultaneously, you've got to have enough server side CPU/GPU to handle that many users.
So yeah, it's complicated and expensive to implement, no matter what you choose. As well as WebRTC, you could also look at screen sharing software such as VNC. Nvidia game streaming might be a more suitable technology to explore, because there's a lot of similarity between 3D games and 3D visualisation, but don't expect it to be a magic bullet.
Have you looked at WebGL? It's the slightly cut down EGL version of OpenGL for JavaScript. If you're not making heavy use of advanced OpenGL 4 capabilities, a lot of OpenGL C/C++ code translates without too much difficulty into JavaScript and WebGL. And just about every web browser on the planet runs WebGL, even if (like Apple) the platform manufacturer discourages regular OpenGL.
The big advantage is that all the rendering and interactivity happens on the client, so latency is not a problem and you're not paying for the CPU/GPU if lots of people want to run it at the same time.
Hope this helps.

How can we benchmark performance of Qt Wayland on a hardware platform?

How can we benchmark performance of Qt Wayland on a hardware platform?
Do we have any benchmarking tools like "glmark2-es2" which is used for standard OpenGL benchmarking. This is required to see if we can use Qt Wayland compositor or have to use Wayland.
glmark2 works on Wayland as well, however it is not in it's current condition a good measure of actual performance. It tries to render frames as fast as possible, regardless of how fast the compositor is actually able to show them. This means most of the frames are wasted, and never shown on the screen. So what it usually measures is how good the compositor is at ignoring superfluous frames from a misbehaving client (most clients wait for the compositor to tell them to draw a new frame so it can be close to vsync). Actually, a compositor locked at a ridiculously low frame rate can more easily achieve a high glmark2 score than one running at a steady 60fps.
Instead, it's better to use a tool that tries to increase the workload per frame while keeping the frame-rate constant at 60fps.
If you're using Qt anyway, then one such tool is https://github.com/CrimsonAS/qmlbench. You'll probably be able to find others if you want something toolkit independent.
EDIT: If you want more of my rants about why glmark2-es is a horrible tool to benchmark compositors, see http://blog.qt.io/blog/2017/05/31/qt-wayland-summary/#comment-1200024

Real time ray tracer

I would like to make a basic real time CPU ray tracer in C++ (mainly for learning proposes). This tutorial was great for making a basic ray tracer. But what would be the best solution to draw this on the screen in real time? I'm not asking on how to optimize the ray tracing-part, just the painting part so that it would paint on the screen and not in a file.
I'm developing on/for windows.
You could check out this Code Project article on the basic paint mechanism using Win32API
Update: OP wants fast drawing, which the Win32API does not provide. The OP needs this so that they can measure speedup of the ray-tracing algorithm during optimization process. Other possibilities for drawing are: DirectX, XNA, Allegro, OpenGL.
I'm professionally working on a realtime CPU raytracer, and from what I saw with 2 years of work there, the GPU part to display image won't be the bottleneck, the bottleneck if you reach it will be the speed of your RAM, I don't think the drawing technology will make any significant difference.
As an example, we are using clustering (one CPU is not enough :p), we were able to render 100-200fps at 1920x1080 when looking the sky but the bottleneck was not the display part, it was the network...
EDIT: We are using OpenGL for the display.
When you are doing a CPU raytracer you are not gonna do printPixelToGPU() but you will write to your RAM and then send it to the GPU once the image is finished. Doing printPixelToGPU() would probably cause an big overhead and it is (in my opinion) a really bad design choice.
It looks like premature optimization. But if you are still concerned about that, just do a bench of how many RAM textures to GPU transfer you can do with OpenGL, directX..., and print the average framerate. You will probably see that the framerate will be really really high, so you will certainly never reach that "bottleneck" unless you are using SDRAM or a really poor GPU.

Profiling a graphics rendering without a profiler

Nowadays we have pretty advanced tools to iron out rendering, allowing to see the different stages, time taken by draw calls, etc. But without them the graphics pipeline is quite a black box when it comes to understand what is happening inside.
Suppose for some reason you have no such tool, or a very limited one. How would you measure anyway what is taking time in your rendering?
I am aware of tricks like discarding draw calls to see the CPU time, setting a 1x1 viewport to see the cost of geometry, using a dumb fragment shader to highlight the fillrate... They are useful already but only give a rough idea of what is going on, and tell nothing about the level of parallelism.
Also, getting the time spent in each stage per draw call seem to be difficult, especially when taking into account the lack of precision due to the noise when measuring.
What tricks do you use when your backpack is almost empty and you still have to profile your rendering? What is your personal Swiss army knife consisting in?
Frame time rendering time
Absolute time spent for small code/stage/etc. is not that relevant as GPU driver optimization/batching/parallelism/version makes it nearly impossible to have precise code measure without GPU counters. (which you can get if you use with vendors libs)
What you can measure easily is each single code change impact. You'll only get relative impact, and it's what you really need anyway. And that just using frame rendering time.
Ideally you should aim be able can edit shader or pipeline code during runtime, and have a direct way to check impact over a whole typical scene, like just comparing graphs between several code path. (beware of static scenes, otherwise you'll end with highly optimized static views, but poor dynamic scenes performance)
Here's the swiss army knife list:
scene states loader
scene recorder (camera paths/add-remove entities,texture, mesh, fake input, etc.) using scene states.
scene states saver
scene frame time logger (not just final average but each frame rendering time)
on-the-fly shader code reload
on-the-fly codepath switch
frame time log reader+graphs+statistic framework
Note that scene state load/save/record are handy for a lot of other things, from debugging to undo/redo to on-the-fly reload, not to mention savegames.
Add a screenshot taker + image diff, and you can unit test graphic code too.
If you can, add that to your CI server so that huge code impact doesn't go unnoticed. (helps also artists when they check-in their assets, without evaluating rendering impact)
A must read on that related CI graphic test work is there : http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/
Note: I'm responding to the question: "Profiling a graphics rendering with a profiler", since that something I was looking for ;)
I'm working mostly on Mac, and I'm using multiple tools:
gDebugger version 5.8 is available on Windows and Mac (this tool has been bought by AMD, the v6 version is Windows only). It gives you statistics about state changes, texture usage, draw calls, etc. It's also usefull to debug texture mapping, and see how your scene is drawn, step by step.
PVRUniSCoEditor it's a shader editor. It compiles on the fly and give you precious details about estimated cycles and registers usage.
Instruments (from XCode Utilities, OSX only), it gets informations from the OpenGL driver, it's great to find bottleneck since you can track what part of the GPU is used at 100% (tiler, renderer, texture unit, etc...)
Adreno Profiler a Windows tool to profile Adreno-based mobile devices. (Very good tool if you work on Android apps ;))
What's your trick about the "dumb fragment shader to highlight the fillrate" ? (drawing a plain color ? or something more advanced ?)

How can I stress the GPU

I would like to add some diagnostic code to our application that stresses both the CPU and GPU, and then measures heat. A third party tool is not an option. From what I can tell, CUDA is not an option either, as it requires Nvidia's compiler - is that right? As far as I can tell, my best option is DirectX. Anything simple and non visual on the GPU would do.
Platform: Windows XP Embedded
DirectX 9.0C
Simply create a shader in HLSL which contain an endless loop.
Turn off all culling and instancing and upload tones of triangle data to the gpu for processing and drawing, this will stress both the CPU (not too much these days) and the GPU should suffer under the overdrawing burden.
one should be able to use the code for any intro tutorial for this (ones that use DrawPrimitiveUP will stress the CPU more, but don't require creation of GPU buffers). you probably also want vsync disabled, so that the GPU works as fast as it can(aka it doesn't wait too much/at all on other events)