Tool to count the miss rate of assembly - c++

I want to speed up the execution of my code cpp and thought of looking at cache misses maybe I could change locality. So is there any such a tool? that would tell me the miss rate of every insns in assembly
Regards

Intel has such tools (VTune). valgrind is a free alternative. I seem to remember that IBM in the purify tools suite has also something for that.

Intel VTune (Windows & Linux)
AMD CodeAnalyst (Windows)

FWIW: If you have a particular system that you are targeting, it is possible that there might exist a simulator which will do this type of thing, but I think you would find in practice that the caching mechanisms vary so greatly from system to system that it would be very difficult to get a meaningful statistic.

Related

Microarchitectural profiling of C++ and assembly code on MIPS

As part of course project, I need to analyze a piece of C++ code for performance and find out which parts of the Computer Architecture (MIPS or x86) are mostly utilized while running the code and is possibly a bottleneck for the performance. I am looking at various Profilers for analyzing the performance and came across SimpleScalar which is a great tool but sadly only works with C code.
Since I am more familiar with MIPS architecture it would be great if there's a tool like SimpleScalar for simulating and profiling C++ code for MIPS. I am looking at the performance critical parts like branch, cache, instruction set, addressing modes etc. If not, mention of any tool which can do the similar kind of analysis for x86 architectures would be great as well.
(Just to clarify, I'm not looking for any old profiler, but for one that understands the CPU microarchitecture and knows what parts of the CPU are taken advantage of or underused.)
CACTI has detailed low-level simulation of cache.
SESC is a cycle accurate computer architecture simulator that supports MIPS.
SESC includes CACTI.
I doubt that what you want is possible. C++ is the language, but it still needs to be compiled to the target architecture. The optimisations (or the lack of them) will determine a lot of your performance criteria like cache use, etc. So I guess you need to look for machine level profilers (Hopefully they support the debug format of your compiler, so you see source code context).
My understanding is that SimpleScalar can simulate and profile MIPS machine code, no matter what the original language it was compiled from.
(The source-level debugger "DLite!" that comes with SimpleScalar may only support a few languages, but it sounds like you don't need to "debug" your code.)

Performance profiling on Linux

What are the best tools for profiling C/C++ applications on *nix?
(I'm hoping to profile a server that is a mix of (blocking) file IO, epoll for network and fork()/execv() for some heavy lifting; but general help and more general tools are all also appreciated.)
Can you get the big system picture of RAM, CPU, network and disk all in one overview, and drill into it?
There's been a lot of talk on the kernel lists about things like perf timechart, but I haven't found anything turning up in Ubuntu yet.
I recommend taking stackshots, for which pstack is useful. Here's some more information:
Comments on gprof.
How stackshots work.
A blow-by-blow example.
A very short explanation.
If you want to spend money, Zoom looks like a pretty good tool.
For performance, you can try Callgrind, a Valgrind tool. Here is a nice article showing it in action.
Compile with -pg, run the program, and then use gprof
Compiling (and linking) with -pg adds profiling code and the profiling libraries to the executable, which then produces a file called gmon.out that contains the timing information. gprof displays call graphs and their (absolute and relative) timings.
See man gprof for details.
The canonical example of a full system profiling tool (for Solaris, OS X, FreeBSD) is DTrace. But it is not yet fully available on Linux (you can try here but the site is down for me at the moment, and I haven't tried it myself). There are many tools, in various states of usefulness, for doing full system profiling and kernel profiling on Linux.
You might consider investigating:
oprofile
SystemTap
bootchart
strace (e.g. this SO answer
Allinea MAP is a profiler for C++ and other native languages on Linux. It is commercially supported by my employer. It has a graphical interface and source-line level profiling and profiles code with almost no slowdown which makes it very accurate where timing of other subsystems is relevant - such as for IO.
Callgrind has been useful and accurate - but the slowdown was ~5x so I could only do smaller runs. It can actually count the number of times a function is called which is useful for understanding asymptotic behavior.
Description of using -gp and gproff here http://www.ibm.com/developerworks/library/l-gnuprof.html
oprofile might interest you. Ubuntu should have all the packages you need.
If you can take your application to freeBSD, OS X , or Solaris you can use dtrace, although dtrace is an analyst oriented tool -- i.e., you need to drive it -- read: script it. Nothing else can give you the level of granularity you need; Dtrace can not just profile the latencies of function calls in user-land; it can also follow a context switch into the kernel.
As mentioned in the accepted answer, Zoom can do some amazing things. I've used it to understand thread behavior all the way down to optimizing the assembly generated by the compiler.
The FOSS answer, as already mentioned, is to build with -pg and then use gprof to analyse the output. If it's a product/project that justifies throwing some money at, I would also be tempted to use IBM/Rationals Quantify profiler as that makes it easier to view the profiling data, drill down to the line level or look at it in a '10000ft' level.
Of course there might be viewer for gprof available that can do the same thing, but I am not aware of any.

C++ Code Profiler

Can anybody recommend a good code profiler for C++?
I came across Shiny - any good? http://sourceforge.net/projects/shinyprofiler/
Callgrind for Unix/Linux
DevPartner for Windows
Not C++ specific, but AMD's CodeAnalyst software is free and is feature-packed.
http://developer.amd.com/cpu/codeanalyst/codeanalystwindows/Pages/default.aspx
Gprof if you use gcc. It may not be user friendly but still useful.
Probably you will be interested in Intel VTune. Rather useful and allows to collect low-level events like cache misses which helps a lot in tuning.
Quantify (part of the IBM/Rational PurifyPlus package) is a very good profiler, but not exactly cheap. It is available on several platforms, too - I've used it on Solaris, Windows and Linux.
Depends on what you need to do:
Measure, so you can do regressions testing to see if changes in performance happened.
Find reasons for suboptimal performance and optimize them.
These are not the same.
For 1, use one of the recommended profilers.
For 2, the profiler I much prefer is one you already have:
http://www.wikihow.com/Optimize-Your-Program%27s-Performance
To see how this goes, check this out.
For C++, as for C# and any language that encourages layers of abstraction, those layers may or may not be good from a software engineering standpoint, but they can kill performance. Every method call is a detour in the execution of your program, and the style encourages you to nest those things, sometimes needlessly. Also the style discourages you from knowing or caring what goes on inside them. You may find them creating and deleting objects underneath at a rate and level of generality far beyond what your application really needs.
AQtime (for Windows)
If you are running a Premium version of VS 2010 then you get a profiler with it.
I've also used a couple of other free ones, but they don't compare to the on MS ships. Useful as a second opinion though.
If you have access to a Mac, then I recommend using Shark from the CHUD tools.
You can use the analyzer that´s in Sun Studio 12 on Linux or Solaris. Itś free. http://developers.sun.com/sunstudio/index.jsp
If you cannot locate DevPartner it is because we've moved under new ownership. Check us out on the Micro Focus website: http://www.microfocus.com/products/micro-focus-developer/devpartner/index.aspx. Shameless plug: I work on the DevPartner team. Our long awaited 64-bit versions of BoundsChecker and C++/.NET profilers ship on February 4, 2011. We've changed our pricing model so you can choose either the whole suite or just the performance profiler if that's what you need. Please check out the new DPS 10.5 release when it goes live!

Decent profiler for Windows? [duplicate]

This question already has answers here:
What are some good profilers for native C++ on Windows? [closed]
(8 answers)
Closed 9 years ago.
Does windows have any decent sampling (eg. non-instrumenting) profilers available? Preferably something akin to Shark on MacOS, although i am willing to accept that i am going to have to pay for such a profiler on windows.
I've tried the profiler in VS Team Suite and was not overly impressed, and was wondering if there were any other good ones.
[Edit: Erk, i forgot to say this is for C/C++, rather than .NET -- sorry for any confusion]
For Windows, check out the free Xperf that ships with the Windows SDK. It uses sampled profile, has some useful UI, & does not require instrumentation. Quite useful for tracking down performance problems. You can answer questions like:
Who is using the most CPU? Drill down to function name using call stacks.
Who is allocating the most memory?
Outstanding memory allocations (leaks)
Who is doing the most registry queries?
Disk writes? etc.
I know I'm adding my answer months after this question was asked, but I thought I'd point out a decent, open-source profiler: Very Sleepy.
It doesn't have the feature count that some of the other profilers mentioned before do, but it's a pretty respectable sampling profiler that will work very well in most situations.
Intel VTune is good and is non-instrumenting. We evaluated a whole bunch of profilers for Windows, and this was the best for working with driver code (though it does unmanaged user level code as well). A particular strength is that it reads all the Intel processor performance counters, so you can get a good understanding of why your code is running slowly, and it was useful for putting prefetch instructions into our code and sorting out data layout to work well with the cache lines, and the way cache lines get invalidated in multi core systems.
It is commercial, and I have to say it isn't the easiest UI in the world.
AMD's CodeAnalyst is FREE here
We use both VTune and AQTime, and I can vouch for both. Which works best for you depends on your needs. Both have free trial versions - I suggest you give them a go.
The Windows Driver Kit includes a non-instrumenting user/kernel sampling profiler called "kernrate". It seems useful for profiling multi-process applications, applications that spend most of their time in the kernel, and device drivers (of course). It's also available in the KrView (Kernrate Viewer) and Windows Server 2003 Resource Kit Tools packages.
Kernrate works on Windows 2000 and later (unlike Xperf, which requires Vista / Server 2008). It's command-line based and the documentation has a somewhat intimidating list of options. I'm not sure if it can record call stacks or just the program counter. If you use a symbol server, make sure to put an up-to-date dbghelp.dll and symsrv.dll in the same directory as kernrate.exe to prevent it from using the ancient version of dbghelp.dll that is installed in %SystemRoot%\system32.
I have tried Intel's vtune with a rather large project about two years ago. It was an instrumenting profiler then and it took so long to instrument the DLL that I was attempting to profile that I eventually lost patience after an hour.
The one tool that I have had quite good success and which i would highly recommend is that of AQTime. It not only provides excellent performance profiling resources but it also doe really good memory profiling which has been of significant help to me in tracking down memory leaks.
Luke Stackwalker seems promising -- it's not as polished as I'd like, but it is open source and it does do something that seems very close to what #Mike Dunlavey keeps saying we ought to do. (Of course, it then tries to smoosh it all down into the typically-unhelpful call graphs that Mike is so weary of, but it shouldn't be too hard to fix that with the source as our ally.)
It even seems to count time spent waiting in the kernel, as far as I can tell...
I'm not sure what a non-instrumenting profiler is, but I can say for .NET I love RedGate's ANTS Profiler. Version 3 beats the MS version for ease of use and Version 4, which allows arbitrary time slices, makes MS look like a joke.

How to measure performance in a C++ (MFC) application?

What good profilers do you know?
What is a good way to measure and tweak the performance of a C++ MFC application?
Is Analysis of algorithms really neccesary? http://en.wikipedia.org/wiki/Algorithm_analysis
I strongly recommend AQTime if you are staying on the Windows platform. It comes with a load of profilers, including static code analysis, and works with most important Windows compilers and systems, including Visual C++, .NET, Delphi, Borland C++, Intel C++ and even gcc. And it integrates into Visual Studio, but can also be used standalone. I love it.
If you're (still) using Visual C++ 6.0, I suggest using the built-in profiler. For more recent versions you could try Compuware DevPartner Performance Analysis Community Edition.
For Windows, check out Xperf which ships free with the Windows SDK. It uses sampled profile, has some useful UI, & does not require instrumentation. Quite useful for tracking down performance problems. You can answer questions like:
Who is using the most CPU? Drill down to function name using call stacks.
Who is allocating the most memory?
Who is doing the most registry queries?
Disk writes? etc.
You will be quite surprised when you find the bottlenecks, as they are probably not where you expected!
It's been a while since I profiled unmanaged code, but when I did I had good results with Intel's vtune. I'm sure somebody else will tell us if that's been overtaken.
Algorithmic analysis has the potential to improve your performance more profoundly than anything you'll find with a profiler, but only for certain classes of application. If you operate over reasonably large sets of data, algorithmic analysis might find ways to be more efficient in CPU/Memory/both, but if your app is mainly form-fill with a relational database for storage, it might not offer you much.
Intel Thread Checker via Vtune performance analyzer- Check this picture for the view i use the most that tells me which function eats up the most of my time.
I can further drill down inside and decompose which functions inside them eats up more time etc. There are different views based on what you are watching (total time = time within fn + children), self time (time spent only in code running inside the function etc).
This tool does a lot more than profiling but i haven't explored them all. I would definitely recommend it. The tool is also available for downloading as a fully functional trial version that can run for 30 days. If you have cost constraints, i would say this window is all that you require to pin point your problem.
Trial download here - https://registrationcenter.intel.com/RegCenter/AutoGen.aspx?ProductID=907&AccountID=&ProgramID=&RequestDt=&rm=EVAL&lang=
ps : I have also played with Rational Rational but for some reason I did not take much to it. I suspect Rational might be more expensive than Intel too.
Tools (like true time from DevPartner) that let you see hit counts for source lines let you quickly find algorithms that have bad 'Big O' complexity. You still have to analyse the algorithm to determine how to reduce the complexity.
I second AQTime, having both AQTime and Compuwares DevPartner, for most cases. The reason being that AQTime will profile any executable that has a valid PDB file, whereas TrueTime requires you to make an instrumented build. This greatly speeds up and simplifies ad hoc profiling. DevPartner is also quite a bit more expensive if this is an issue. Where DevPartner comes into its own is with BoundsChecker, which I still rate as a better tool for catching leaks and overwrites than AQTimes execution profiler. TrueTime can be slighly more accurate than AQTime, but I have never found this to be an issue.
Is profiling worthwhile, IMO yes, if you need performance gains on a local application. I think you also learn a lot about how your program and algorithms really work, and the cost implications of using certain types of object classes for storing and iterating through your data.
Glowcode is a very nice profiler (when it works). It can attach to a running program and requires only symbol files - you don't need to rebuild.
Some versions pf visual studio 2005 (and maybe 2008) actually come with a pretty good performance profiler.
if you have it it should be available under the tools menu
or you can search for a way to open the "performance explorer" window to start a new performance session.
A link to MSDN
FYI, Some versions of Visual Studio only come with a non-optimizing compiler. For one of my my MFC apps if I compile it with MINGW/MSYS ( gcc compiler ) with -o3 then it runs about 5-10x as fast as my release compile with Visual Studio.
For example I have an openstreetmap xml compiler and it takes about 3 minutes ( the gcc compiled version) to process a 2.7GB xml file. My visual studio compile of the same code takes about 18 minutes to run.