VSPerfReport: exclusive time equal to inclusive time - c++

I am profiling C++ using an instrumenting profiler from Microsoft (VSPerf) and converting the .vsp report file to .csv using VSPerfReport. In Report_FunctionSummary.csv the exclusive and inclusive times are the same for functions which do call other functions in the same object file. Is this a known problem and how can it be solved? I'm using Visual Studio 2008.

Inclusive time is exclusive time plus time in callees. So if a routine has no callees, inclusive time equals exclusive time. It will also be equal if the routine did call other routines but the profiler could not see those other routines.

Related

Writing program execution time to .txt file

My IDE is Visual Studio 2010 with integrated Intel Fortran compiler. The version of compiler is: Intel Parallel Studio XE 2011
I have intention to write in Output.txt file some interesting things from program execution. In this case i want to write program execution time, or execution time for specific do loop or specific do while iteration loop because I want to know how much time every single loop takes.
Is there any way for doing that?

How to calculate ANSI C code performance?

I have written a simple code in ANSI C and now would want to perform some measurements.
I have measured execution time (using clock() function under the Windows OS and clock_gettime() under the Linux OS).
Now I would want to calculate, how many IPSes (Instructions Per Second) my CPU executes, while running this code of mine. (Yes, I know that MIPS is a pathetic parameter, but even this, I want to calculate it)
It would be also nice to see, how many CPIs (Cycles Per Instruction) it takes to perform e.g. addition of 3 elements and others operations I perform.
Google says how to calculate number of MIPS using calculator, some knowledge about my CPU (its clock speed), simple math and a bunch of other parameters (like CPI), but doesn't say HOW to obtain those!
I haven't found also any C/C++ function which would return the number of clock cycles needed to perform e.g. access to a local variable.
There is also a problem to find a Reference Manual by Intel/AMD for a modern CPU, which would have information about opcodes and others.
I have manually calculated, that my ANSI C code takes 37 operations, but those are ANSI C operations, not CPU instructions.
The easiest way of getting high accuracy timing on windows is PerformanceCounter, see How to use QueryPerformanceCounter?
Then you simply need some functions that perform the operations you are interested in timing. You have to be a little careful of caching etc. so run the calculation several times and look at the distribution of times

Windows vs. Linux memory allocation/std::list constructor performance

I'm porting C++ code from Linux to Windows. During this process, I found out that the following line takes ~10 times slower under Windows (on exactly the same hardware):
list<char*>* item = new list<char*>[160000];
On Windows it takes ~10ms, while on Linux it takes ~1ms. Note that this is the average time. Running this row 100 times takes ~1 second on Windows.
This happens both on win32 and x64, both versions are compiled in Release, and the speed is measured via QueryPerformanceCounter (Windows) and gettimeofday (Linux).
The Linux compiler is gcc. The Windows compiler is VS2010.
Any idea why could this happen?
It could be more an issue of library implementation. I would expect a
single allocation in most cases, with the default constructor for list
not allocating anything. So what you're trying to measure is the cost
of the default constructor of list (which is executed 160000).
I say "trying to measure", because any measurements that small are
measuring clock jitter and resolution more than they're measuring code
execution times. You should put this in a loop, to execute it
frequently enough to get a runtime of a couple of seconds. And when you
do this, you need to take precautions to ensure that the compiler
doesn't optimize anything out.
And under Linux, you want to measure using clock(), at least; the wall
clock time you get from gettimeofday is very dependent on what else
happens to happen at the same time. (Don't use clock() under Windows,
however. The Windows implementation is broken.)
I think this instruction takes less time in both OS (regardless of anything). In this case It take such few time that you may be actually measuring the resolution of your timers.

Visual Studio 2008 Profiler - Instrumented produces strange results

I run the Visual Studio 2008 profiler on a "RelDebug" build of my app. Optimizations are on, but inlining is only moderate, stack frames are present, and symbols are emitted. In other words, RelDebug is a somewhat optimized build that can be debugged (although the usual Release caveats about inspecting variables applies).
I run both the Sampling, and the Instrumented profiler on separate runs.
Result? The Sampling profiler produces a result that looks reasonable. However when I look at the Instrumented profiler results, I see functions that should not even be near the top of the list, coming out up to.
For example, a function like "SetFont" that consists of only 1 line assigning the height to a class member. Or "SetClipRect" that merely assigns a rectangle.
Of course I am looking at "Exclusive" stats (i.e. minus children).
This happen to anyone else? It always seems to happen once my application has grown to a certain size. It makes the instrumented profiler useless at that point.
I figured out the problem. Both the Visual Studio 2008 and the Visual Studio 2010 profilers are mediocre (to put it politely). I bought Intel C++ Studio which comes with vTune Amplifier (a profiler). Using the Intel profiler on the exact same code I was able to get profiler results that actually made sense.
You say "of course you are looking at Exclusive". Look at inclusive stats. In all but the simplest programs or algorithms, nearly all the time is spent in subroutines and functions, so if you've got a performance problem, it most likely consists of calls you didn't know were time-hogs.
The method I rely on is this. Assuming you are trying to find out what you could fix to make the code faster, it will find it, while not wasting your time with high-precision statistics about things that are not problems.
There's no bug. Sampling cannot tell you how much time you spent per call. Profiler is just counting how many times timer ended up in that specific function. Since SetFont is not frequently called, you don't get many hits in that function and you get impression that that function is not time consuming.
On the other hand, when you run instrumentation, profiler counts every call and measures execution time of every function. That is why you get accurate information about functions CPU consumption.
When examining instrumentation results you must always look at number of calls as well. Since SetFont is more-less API it doesn't matter if it's exclusive or inclusive. The only thing that matters is its overall time and how frequently it's called.

Cycle count measurement

I have a MS Visual Studio 2005 application solution. All the code is in C. I want to measure the number of cycles taken for execution by particular functions. Is there any Win32 API which I can use to get the cycle count?
I have used gettimeofday() to get time in micro seconds, but I want to know the cycles consumed.
Both Intel and AMD offer windows libraries and tools to access the performance counters on their cpus. These give access not only to cycle counts, but also to cache line hits and misses and TLB flush counts. The Intel tools are marketed under the name VTune, while AMD calls theirs CodeAnalyst.