I would like to add some diagnostic code to our application that stresses both the CPU and GPU, and then measures heat. A third party tool is not an option. From what I can tell, CUDA is not an option either, as it requires Nvidia's compiler - is that right? As far as I can tell, my best option is DirectX. Anything simple and non visual on the GPU would do.
Platform: Windows XP Embedded
DirectX 9.0C
Simply create a shader in HLSL which contain an endless loop.
Turn off all culling and instancing and upload tones of triangle data to the gpu for processing and drawing, this will stress both the CPU (not too much these days) and the GPU should suffer under the overdrawing burden.
one should be able to use the code for any intro tutorial for this (ones that use DrawPrimitiveUP will stress the CPU more, but don't require creation of GPU buffers). you probably also want vsync disabled, so that the GPU works as fast as it can(aka it doesn't wait too much/at all on other events)
Related
I have a code which basically draws parallel coordinates using opengl fixed func pipeline.
The coordinate has 7 axes and draws 64k lines. SO the output is cluttered, but when I run the code on my laptop which has intel i5 proc, 8gb ddr3 ram it runs fine. One of my friend ran the same code in two different systems both having intel i7 and 8gb ddr3 ram along with a nvidia gpu. In those systems the code runs with shuttering and sometimes the mouse pointer becomes unresponsive. If you guys can give some idea why this is happening, it would be of great help. Initially I thought it would run even faster in those systems as they have a dedicated gpu. My own laptop has ubuntu 12.04 and both the other systems have ubuntu 10.x.
Fixed function pipeline is implemented using gpu programmable features in modern opengl drivers. This means most of the work is done by the GPU. Fixed function opengl shouldn't be any slower than using glsl for doing the same things, but just really inflexible.
What do you mean by coordinates having axes and 7 axes? Do you have screen shots of your application?
Mouse stuttering sounds like you are seriously taxing your display driver. This sounds like you are making too many opengl calls. Are you using immediate mode (glBegin glVertex ...)? Some OpenGL drivers might not have the best implementation of immediate mode. You should use vertex buffer objects for your data.
Maybe I've misunderstood you, but here I go.
There are API calls such as glBegin, glEnd which give commands to the GPU, so they are using GPU horsepower, though there are also calls to arrays, other function which have no relation to API - they use CPU.
Now it's a good practice to preload your models outside the onDraw loop of the OpenGL by saving the data in buffers (glGenBuffers etc) and then use these buffers(VBO/IBO) in your onDraw loop.
If managed correctly it can decrease the load on your GPU/CPU. Hope this helps.
Oleg
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How does OpenGL work at the lowest level?
When we make a program that uses the OpenGL library, for example for the Windows platform and have a graphics card that supports OpenGL, what happens is this:
We developed our program in a programming language linking the graphics with OpenGL (eg Visual C++).
Compile and link the program for the target platform (eg Windows)
When you run the program, as we have a graphics card that supports OpenGL, the driver installed on the same Windows will be responsible for managing the same graphics. To do this, when the CPU will send the required data to the chip on the graphics card (eg NVIDIA GPU) sketch the results.
In this context, we talk about graphics acceleration and downloaded to the CPU that the work of calculating the framebuffer end of our graphic representation.
In this environment, when the driver of the GPU receives data, how leverages the capabilities of the GPU to accelerate the drawing? Translates instructions and data received CUDA language type to exploit parallelism capabilities? Or just copy the data received from the CPU in specific areas of the device memory? Do not quite understand this part.
Finally, if I had a card that supports OpenGL, does the driver installed in Windows detect the problem? Would get a CPU error or would you calculate our framebuffer?
You'd better work into computer gaming sites. They frequently give articles on how 3D graphics works and how "artefacts" present themselves in case of errors in games or drivers.
You can also read article on architecture of 3D libraries like Mesa or Gallium.
Overall drivers have a set of methods for implementing this or that functionality of Direct 3D or OpenGL or another standard API. When they are loading, they check the hardware. You can have cheap videocard or expensive one, recent one or one released 3 years ago... that is different hardware. So drivers are trying to map each API feature to an implementation that can be used on given computer, accelerated by GPU, accelerated by CPU like SSE4, or even some more basic implementation.
Then driver try to estimate GPU load. Sometimes function can be accelerated, yet the GPU (especially low-end ones) is alreay overloaded by other task, then it maybe would try to calculate on CPU instead of waiting for GPU time slot.
When you make mistake there is always several variants, depending on intelligence and quality of driver.
Maybe driver would fix the error for you, ignoring your commands and running its own set of them instead.
Maybe the driver would return to your program some error code
Maybe the driver would execute the command as is. If you issued painting wit hred colour instead of green - that is an error, but the kind that driver can not know about. Search for "3d artefacts" on PC gaming related sites.
In worst case your eror would interfere with error in driver and your computer would crash and reboot.
Of course all those adaptive strategies are rather complex and indeterminate, that causes 3D drivers be closed and know-how of their internals closely guarded.
Search sites dedicated to 3D gaming and perhaps also to 3D modelling - they should rate videocards "which are better to buy" and sometimes when they review new chip families they compose rather detailed essays about technical internals of all this.
To question 5.
Some of the things that a driver does: It compiles your GPU programs (vertex,fragment, etc. shaders) to the machine instructions of the specific card, uploads the compiled programs to the appropriate area of the device memory and arranges the programs to be executed in parallel on the many many graphics cores on the card.
It uploads the graphics data (vertex coordinates, textures, etc.) to the appropriate type of graphics card memory, using various hints from the programmer, for example whether the date is frequently, infrequently, or not at all updated.
It may utilize special units in the graphics card for transfer of data to/from host memory, for example some nVidia card have a DMA unit (some Quadro card may have two or more), which can upload, for example, textures in parallel with the usual driver operation (other transfers, drawing, etc).
Nowadays we have pretty advanced tools to iron out rendering, allowing to see the different stages, time taken by draw calls, etc. But without them the graphics pipeline is quite a black box when it comes to understand what is happening inside.
Suppose for some reason you have no such tool, or a very limited one. How would you measure anyway what is taking time in your rendering?
I am aware of tricks like discarding draw calls to see the CPU time, setting a 1x1 viewport to see the cost of geometry, using a dumb fragment shader to highlight the fillrate... They are useful already but only give a rough idea of what is going on, and tell nothing about the level of parallelism.
Also, getting the time spent in each stage per draw call seem to be difficult, especially when taking into account the lack of precision due to the noise when measuring.
What tricks do you use when your backpack is almost empty and you still have to profile your rendering? What is your personal Swiss army knife consisting in?
Frame time rendering time
Absolute time spent for small code/stage/etc. is not that relevant as GPU driver optimization/batching/parallelism/version makes it nearly impossible to have precise code measure without GPU counters. (which you can get if you use with vendors libs)
What you can measure easily is each single code change impact. You'll only get relative impact, and it's what you really need anyway. And that just using frame rendering time.
Ideally you should aim be able can edit shader or pipeline code during runtime, and have a direct way to check impact over a whole typical scene, like just comparing graphs between several code path. (beware of static scenes, otherwise you'll end with highly optimized static views, but poor dynamic scenes performance)
Here's the swiss army knife list:
scene states loader
scene recorder (camera paths/add-remove entities,texture, mesh, fake input, etc.) using scene states.
scene states saver
scene frame time logger (not just final average but each frame rendering time)
on-the-fly shader code reload
on-the-fly codepath switch
frame time log reader+graphs+statistic framework
Note that scene state load/save/record are handy for a lot of other things, from debugging to undo/redo to on-the-fly reload, not to mention savegames.
Add a screenshot taker + image diff, and you can unit test graphic code too.
If you can, add that to your CI server so that huge code impact doesn't go unnoticed. (helps also artists when they check-in their assets, without evaluating rendering impact)
A must read on that related CI graphic test work is there : http://aras-p.info/blog/2011/06/17/testing-graphics-code-4-years-later/
Note: I'm responding to the question: "Profiling a graphics rendering with a profiler", since that something I was looking for ;)
I'm working mostly on Mac, and I'm using multiple tools:
gDebugger version 5.8 is available on Windows and Mac (this tool has been bought by AMD, the v6 version is Windows only). It gives you statistics about state changes, texture usage, draw calls, etc. It's also usefull to debug texture mapping, and see how your scene is drawn, step by step.
PVRUniSCoEditor it's a shader editor. It compiles on the fly and give you precious details about estimated cycles and registers usage.
Instruments (from XCode Utilities, OSX only), it gets informations from the OpenGL driver, it's great to find bottleneck since you can track what part of the GPU is used at 100% (tiler, renderer, texture unit, etc...)
Adreno Profiler a Windows tool to profile Adreno-based mobile devices. (Very good tool if you work on Android apps ;))
What's your trick about the "dumb fragment shader to highlight the fillrate" ? (drawing a plain color ? or something more advanced ?)
To what extend does OpenGL's GLSL utilize SLI setups? Is it utilized at all at the point of execution or only for end rendering?
Similarly, I know that OpenCL is alien to SLI but assuming one has several GPUs, how does it compare to GLSL in multiprocessing?
Since it might depend on the application, e.g. common transformation, or ray tracing, can you offer insight on differences depending on application type?
The goal of SLI is to divide the rendering workload on several GPU. First, the graphic driver uses a either a Sort-first or time decomposition (GPU0 works on frame n while GPU1 works on frame n+1) approach. And then, the pixels are copied from one GPU to the other.
That said, SLI has nothing to do with the shading language used by OpenGL (the way the pixels are drawn doesn't really matter).
For OpenCL, I would say that you have to divide your workload between the GPU by yourself, but I am not sure.
If you want to take advantage of multiple GPUs with OpenCL, you will have to create command queues for each device and run kernels on each device after splitting up the workload.
See http://developer.nvidia.com/object/sli_best_practices.html
Basically, you have to instruct the driver that you want to use SLI, and in which mode. After this, the driver will (almost) seamlessly do all the work for you.
Alternate Frame Rendering : no sync needed, so better performance, but more lag
Split Frame Rendering : lots of sync, some vertices are processed twice, but less lag.
For you GLSL vs OpenCL comparison, I don't know of any good benchmark. I'd be interested, though.
Ive got an application which drops to around 10fps. I profiled it with xperf which showed my app was using just 20% of the CPU, with none of my methods using a larger than expected amount of that 20%.
This seems to indicate that the vast drop in fps is because the graphics card isnt able to keep up with rendering the frame, resulting in my program stopping while it catches up...
Is there some way to profile what the graphics card is up to and work out what my program is telling it to do thats slowing it down, so that I can try to improve the frame rate?
For debugging / profiling graphics, try Nvidia PerfHUD
NVIDIA PerfHUD is a powerful real-time performance analysis tool for Direct3D applications.
There is also an ATI solution, called 'GPU PerfStudio'
GPU PerfStudio is a real-time performance analysis tool which has been designed to help tune the graphics performance of your DirectX 9, DirectX 10, and OpenGL applications. GPU PerfStudio displays real-time API, driver and hardware data which can be visualized using extremely flexible plotting and bar chart mechanisms. The application being profiled maybe executed locally or remotely over the network. GPU PerfStudio allows the developer to override key rendering states in real-time for rapid bottleneck detection. An auto-analysis window can be used for identifying performance issues at various stages of the graphics pipeline. No special drivers or code modifications are needed to use GPU PerfStudio.
You can find more information and download links here:
http://developer.nvidia.com/object/nvperfhud_home.html
http://developer.amd.com/tools-and-sdks/graphics-development/gpu-perfstudio/
Also, check out this article on FPS:
FPS vs Frame Time
Basically it talks about the fact that a drop from 200fps to 190fps is negligible, whereas a drop from 30fps to 20fps is a MUCH bigger deal. For better performance measuring, you should be calculating frame time rather than FPS.
You never told us what your fps is or what the program is doing at all, so your "vast drop" might not be a big deal at all.
For DirectX, there is PIX for profiling the CPU and GPU operations. It can give very detailed info, and might be worth looking into.
Hope that helps!
You can try using dxprof (search in google). It's lightweight app that draws real-time bars, each bar corresponding to one DirectX event (such as draw-call or resource copy). You can freeze the bars and check calls stack to find out where the draw-call originates from.
Are you developing for Windows? If so avoid using Video for Windows as this will limit you in the manner that you describe. Use DirectX instead.
No need to guess. Just pause it a few times under the IDE, and it will show you exactly what it's waiting for.