IronPython Profiler - profiling

I need to find a good IronPython profiler. On the net I found several Python profilers that won't work under IronPython or are not available. The best one so far is the one described here. It prints method names, inclusive/exclusive times and the number of calls. But it seems to be buggy because it doesn't print ALL of the methods and for some reason both the inclusive and exclusive times are the same.
Does anyone know about another way of profiling IronPython 2.7.5 code?

Related

Time-based profiling using Clang instrumentation

Clang's -fprofile-instr-generate option can record the number of times each line of code (and even parts of a line of code) is executed. There is some overhead but it is pretty minimal.
Is there a way to get Clang to do a similar thing but recording the total execution time for a line of code rather than the number of times it was run.
I know there are sample-based profilers (perf, etc.) but these seem to suck - e.g. as far as I can tell they sample the call stack so you don't get line-level information.
I am ok with a significant overhead (e.g. 100%) as long as it doesn't distort the relative timings too much (+/-30% is fine).
There does seem to be something like this - it's called XRay and was developed by Google.
It doesn't go as far as line-by-line profiling, or even to the basic block level as far as I can tell. The granularity is limited to functions - but you can control exactly which functions are instrumented (by default those with more than 100 instructions) and even turn on and off the instrumentation at runtime.
It seems to be at a fairly early stage of development and only works on Linux. Looks useful nonetheless.
Edit: gperftools actually work very well for this (I guess I dismissed them earlier because pprof didn't used to work at all on Mac, but I fixed that). I strongly recommend you use the -http option of pprof - it gives you a cool interactive interface with source code, a call graph, a flame graph and so on.

Speed - embedding python in c++ or extending python with c++

I have some big mysql databases with data for calculations and some parts where I need to get data from external websites.
I used python to do the whole thing until now, but what shall I say: its not a speedster.
Now I'm thinking about mixing Python with C++ using Boost::Python and Python C API.
The question I've got now is: what is the better way to get some speed.
Shall I extend python with some c++ code or shall I embedd python code into a c++ programm?
I will get fore sure some speed increment using c++ code for the calculating parts and I think that calling the Python interpreter inside of an C-application will not be better, because the python interpreter will run the whole time. And I must wrap things python-libraries like mysqldb or urllib3 to have a nice way to work inside c++.
So what whould you suggest is the better way to go: extending or embedding?
( I love the python language, but I'm also familiar with c++ and respect it for speed )
Update:
So I switched some parts from python to c++ and used multi threading (real one) in my c modules and my programm now needs instead of 7 hours 30 minutes :))))
In principle, I agree with the first two answers. Anything coming from disk or across a network connection is likely to be a bigger bottleneck than the application.
All the research of the last 50 years indicates that people often have inaccurate intuition about system performance issues. So IMHO, you really need to gather some evidence, by measuring what is actually happening, then chose a solution based on that evidence.
To try to confirm what is causing the slow performance, measure the system and user time of your application (e.g time python prog.py), and measure the load on the machine.
It the application is maxing-out the CPU, and most of that time is spent in the application (user time), then there may be a case for using a more effective technology for the application.
But if the CPU is not maxed, or the application spends most of its time in the system (system time), and not in the application (user time), then it is unlikely that changing the application programming technology will help significantly. (This is an example of Amdahl's Law http://en.wikipedia.org/wiki/Amdahl%27s_law)
You may also need to measure the performance of your database server, and maybe network connection, to identify the source of the bottle neck, but start with the easiest part.
In my opinion, in your case it makes no sense to embed Python in C++, while the reverse could be beneficial.
In most of programs, the performance problems are very localized, which means that you should rewrite the problematic code in C++ only where it makes sense, leaving Python for the rest.
This gives you the best of both world: the speed of C++ where you need it, the ease of use and flexibility of Python everywhere else. What is also great is that you can do this process step by step, replacing the slow code paths by the by, leaving you always with the whole application in an usable (and testable!) state.
The reverse wouldn't make sense: you'd have to rewrite almost all the code, sacrificing the flexibility of the Python structure.
Still, as always when talking about performance, before acting measure: if your bottleneck is not CPU/memory bound switching to C++ isn't likely to produce much advantages.

how to profile code more accurately and precisely on windows vista?

Hey guys, I have a question.
how to profile code more accurately and precisely?
My OS is windows vista and my processor is Intel Centrino.
Right now I am compiling my c++ code using "-O0 -o" option on g++ and I am profiling using window's "QueryPerformanceCounter" and related APIs
accuracy and precision are two different things.
if you are shooting something, it hits the target rather than missing is called accurate.
And shooting many times, it hits same places rather than hitting many many different place is called precise.
Thank you,
There's lots of discussion on this. If the reason for profiling is to discover ways to make the code faster, what helps you the most is to find them with certainty. Measuring the amount of time they cost with precision is much less important.
Here's an example in another language and OS, but the principle is the same.
Have you tried AMD's CodeAnalyst tool? It's a free download and may be helpful.

How compatible is WPS with SAS?

How compatible is the WPS SAS-clone with the corresponding products from SAS Institute?
Has anyone tried it - if so: have you run into any compatibility issues?
WPS has a good comparison document available on their website. It lists areas where WPS and SAS are compatible and where they are not. Start in http://www.teamwpc.co.uk/products/wps/language
I have tried it, although briefly and three years ago. I have a former co-worker that uses it exclusively as an alternative to SAS. He manages to function mostly without problems, but does run into some syntax related oddities on some of the more obscure procedures or functions.
WPS does offer a free trial, so you've nothing to lose with that. Give it a shot?
I don't really have an answer, but have you tried SAS-L? I'm sure you'll find people there with strong opinions about this. Also there are a couple of threads available already, which might get you started.
My 2 cents:
I've tried WPS for some time and it does indeed make for a SAS working substitute as long as you keep yourself from using some advanced stuff from the base language.
A few examples from my own experience:
I wasn't able to use the CALL MODULE* family for calling win32 API functions.
When reading one row at a time through FETCH() you cannot use CALL SET() for automatically initializing all the variables found in the input dataset.
Some random errors using macros. Sorry I don't remember those in detail.
In a few words: if you have a working SAS installation, ask for a WPS trial and test if it fits your use. If it does, be sure it's a tremendous save in licensing.

C and C++ source code profiling tools [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What's your favorite profiling tool (for C++)
Are there any good tools to profile a source code which is mix of of C and C++. What are the pros and cons of any, and which ones have you used and would reccomend for usage. Please do not get me a list of tools from google. I can do that too, what i want is to leverage the personal experience of someone who has used these tools and knows the pros and cons about them.
Thanks in advance.
I've found gprof to be the best CPU hotspot profiler, and Google Performance Tools to be the best sampling profiler. Both work for C and C++.
In my opinion there are no good profiling tools on Windows.
GNU gprof pros and cons
GCC only
Works with C and C++
Only treats CPU time, and code inside the binary, you need everything you wish to profile statically linked in
Very accurate
Adds a small overhead to execution
Google Performance Tools pros and cons
I think it requires the GNU tool chain
Occasionally fails to identify symbols
Very customizable
Outputs to a huge variety of formats, including the Callgrind format, and automatically loads KCacheGrind for you
Has various memory profiling tools also
Is a sampling profiler, with minimal overhead
Related useful questions and answers
Alternative to -pg with Clang?
What's your favorite profiling tool (for C++)
Alternatives to gprof
C++ Code Profiler
Confusing gprof output
I would respectfully disagree with Matt.
The tool I use all the time on Windows is the random-pausing technique, and it works with all languages that the IDE supports.
As an example of using it to do performance tuning, this case shows how a speedup of 43 times was achieved through a series of steps.
Gprof has a lot of problems, listed here, and according to the google-perftools manual, some of the same issues are repeated there, such as reporting procedures, not lines, emphasizing self (local) time, emphasizing the graph, etc. (I can't tell from the doc if it samples while blocked.)
As software systems become ever larger, self time becomes less and less relevant. The program counter spends most of its time in library routines or blocked in the system.
Graphs become gigantic nests.
People ask "I know function X is costly, but where in function X is the problem?"
What's more, the "bottlenecks" get bigger and bigger, because the stack gets deeper on average, and every layer of the stack is a fresh opportunity to do more function calls than necessary.
An example of a stack-sampler that reports percent by line, and samples while blocked, and allows user control of sampling so as not to dilute the sample set during user input, is Zoom.
EDIT: Sorry, can't leave well enough alone. Here's a new explanation:
The way programs work, they trace out a call tree, which is a lot like the oak tree outside my window. It has a trunk (main) which sprouts branches (call sites) which sprout further branches for several levels out to leaves (instructions) and acorns (blocking calls).
When the tree surgeon comes to prune (optimize) it, does he look only where the leaves are (hotspots)? Does he ignore acorns (no samples during blocking)?
No, he looks for branches (call sites) that are both heavy (on the stack a lot) and unhealthy (unnecessary). Those are what he prunes.
That's what random-pausing and Zoom do, is help find those call sites.
You can use Callgrind to create profiling output. It is part of Valgrind.
Callgrind-output could be used with KCacheGrind, which is probably worth a look as long as you're using Linux.
AMD CodeAnalyst is pretty nice. It's also cross platform which is nice when one finds a platform specific bottleneck.