I'm trying to profile a 64 bit OpenGL application using the MSVS 2013 profiler (CPU sampling). According to Sysinternals Process Explorer, my application seems to use only 60% of GPU ressources but 100% of a CPU core (since it's only single-threaded for the time being), so the CPU code seems to be the bottleneck. Now I tried to figure out what the hotspots are, in order to optimize/parallelize my code.
However, the profiling results tell that 98% of the time is spent by nvogl64v.dll -- more most notably, 75% within gdi32.dll, 6% in KernelBase.dll.
I have no clue what to do with this information and what optimiziations in my code could help. What conclusion can I draw from that? I'm using freeglut for windowing, the profiler tells negligible 2% is spent in freeglut.dll, thus in my idle and display functions, so I'm not sure if any changes in my update and draw loops would have any effect.
Any hints?
EDIT:
I now figured out how to load according debugging symobols from MS Symbol Servers, now I can go one step deeper into the callstack: Turns out, the portion of gdi32.dll is spent mainly in NtGdiDdDDIEscape (55%) and NtGdiDdDDIGetDeviceState (17%), while KernelBase.dll portion is due to SwitchToThread
Related
I am using Eclipse for C/C++ development. I am trying to compile and run a project. When I compile and run the project after a while my CPU gets to 100% usage . I checked "Task Manager" and there I found that Eclipse isn't closing any of the previous build and it's running in the background which uses my CPU heavily. How do I solve this problem. When at 100% usage my PC becomes very very slow.
If you don't want the build to use up all your CPU time (maybe because you want to do other stuff while building) then you could decrease the parallelism of the build to a point where it leaves one or more cores unused. For example, if you have 8 cores you could configure your build to only use 6 of them.
Your build will take longer, but your machine will be more responsive for other tasks while the build runs.
Adding More RAM seems to have solved my problem. Disk usage is also low now. Maybe Since there wasnt enough RAM in my laptop the CPU was fetching data from the Disk directly which made the disk usage to go up.
I have imported an app from Visual Studio compiler to MinGW and I faced a problem – performance degradation. Usage of CPU increased from 30% to 100%.
There is one interesting thing. If before running my app or during, I’ve run Windows Media Player – performance of my app is going to fine. CPU usage is going down till 30% and works faster (about 10 times faster).
I’ve googled it and found. It relates to a service, which names as a Multimedia Class Scheduler Service (MMCSS). The main problem is: this service forks under Windows Vista and later, but I’ve tested and imported my app under Win XP.
So, does anyone know how to use this feature under XP? And how Windows Media Player increases performance of my app?
Windows Media Player changes the resolution of the system multimedia timer. Basically, this occurs when your application really should be using something like the High Performance Timer but is using the multimedia timer instead, which simply doesn't have and isn't intended to have the necessary accuracy or resolution to be a high-performance timer. As a result, any timings in your program essentially don't work as they should, which is especially bad if you're trying to sleep or block for a fixed time.
I am trying to optimize my OpenCL kernels and all I have right now is NVidia Visual Profiler, which seems rather constrained. I would like to see line-by-line profile of kernels to better understand issues with coalescing, etc. Is there a way to get more thorough profiling data than the one, provided by Visual Profiler?
I think that AMD CodeXL is what you are looking for. It's a free set of tools that contains an OpenCL debugger and a GPU profiler.
The OpenCL debugger allows you to do line-by-line debugging of your OpenCL kernels and host code, view all variables across different workgroups, view special events and errors that occur, etc..
The GPU profiler has a nice feature that generates a timeline displaying how long your program spends on tasks like data transfer and kernel execution.
For more info and download links, check out http://developer.amd.com/tools-and-sdks/heterogeneous-computing/codexl/
No, there is no such tool but you can profile your code changes. Try measuring the speed of your code, change something and then measure it once again. clEnqueueNDRangeKernel has an Event argument which can be used with clGetEventProfilingInfo afterwards, the timer is very sharp, the accuracy is measured in order of microseconds. This is the only way to measure performance of a separate code part...
I haven't test it but I just found this program http://www.gremedy.com/gDEBuggerCL.php.
The description is: " This new product brings gDEBugger's advanced Debugging, Profiling and Memory Analysis abilities to the OpenCL developer's world..."
LTPV is an open-source, OpenCL profiler that may fit your requirements. It is, for now, only working under Linux.
(disclosure: I am the developer of this tool)
I'm looking for a way to profile my openMPI program in C, i'm using openMPI 1.3 with Linux Ubuntu 9.10 and my programs are run under a Intel Duo T1600.
what I want in profile is cache-misses, memory usage and execution time in any part of the program.
thanks for reply
For Linux I recommend Zoom for this kind of profiling. You can get a free 30 day evaluation in order to try it out.
I finally found graphical tools for mpi profilling
vampir : www.vampir.eu and
paraprof at http://www.cs.uoregon.edu/research/tau/docs/paraprof/index.html
enjoy
Have a look at gprof and at Intel's VTune. Valgrind with the cachegrind tool could be useful, too.
Allinea MAP is ideal for this. It will highlight poor cache performance, memory usage and execution time right down to the source lines in your code. There is no need to recompile or instrument the application in order to profile it with Allinea MAP - which makes it unusually easy to get started with. On most HPC systems and with most MPIs it takes your binary, runs it, and loads up the source code automatically to display the recorded performance data.
Take a look to profiling MPI. Some tools for profiling is mpiP and pgprof.
I want to be able to get the current % CPU usage in a C++ program running under Wince.
I found this link that states where the source code is but I cannot find it in my platform builder installation - I expect this is because it isn't the Windows Automotive platform.
Does anyone know where I can find this source code or (even better) know how I can get this information directly? i.e. what DLL / function calls to make etc.
Since GetProcessTimes doesn't exist in CE, you have to calculate this.
You have to start with the toolhelp APIs to enumerate the processes and the threads in the processes. You then call GetThreadTimes for each thread and add all that up.
Bear in mind that the act of calculating this info will affect the CPU utilization.
I have found that GetIdleTime (or CeGetIdleTimeEx on WEC7 or newer) works well for calculating system-wide processor usage. Sample code for calculating processor idle time percentage is shown on GetIdleTime MSDN page. Obviously, processor utilization percentage can then be calculated by subtracting the idle time percentage from 100.
The MSDN page does warn that support for GetIdleTime is dependent on OAL implementation.
Note that when using the toolhelp APIs to calculate the CPU usage, you need to take two measurements, then calculate the difference. when doing so, you won't know how much CPU any threads that were terminated before the second sample took.
So, applications that often create short-lived threads will not be represented properly in your result.
You can look into Remote Task Monitor. It will let you get the current % CPU usage of your process (or thread), exactly what you are looking for. It also is very light weight, does not impact your device much.