Calls to typeinf and typeinf_ext in Julia profiling output - profiling

I am profiling my Julia code. Significant portion of the time is spent in typeinf and typeinf_ext (and their callees). Is this a tell of type incertainty or type instability?

Refer to Julia lang documentation:
It’s a good idea to first run the code you intend to profile at least
once (unless you want to profile Julia’s JIT-compiler)

Related

C++ Code profiling/analysis for Mac and MPI

I am looking for a code analysis/profiling tool for C++ on MacOS. I know that there have been posts about this thread, but my application in which I need is very specific, so maybe one can give me a little more specific advice.
So here is my problem: I am writing a scientific code (master's project) in C++, so it's a pure console application, no interactivity given. The code is supposed to run on massively parallel computers, thus I use MPI. However, right now I am not yet optimizing for scalability, but only for singlecore performance. Since I do not want to rewrite the whole programm as a serial one, I just use MPI with 1 thread. It works fine, but the optimizer obviously needs to be able to deal with this.
What do I want to analyze? Well, the code is not very complex in a sense that it has a very simple structure and thus all I need would be a list of how long the program spends in certain functions, so that I know where it loses most time and I can measure the speedup of my optimizations.
Thanks for all ideas
You should use Instruments.app which includes a CPU sampler and thread activity viewer... among other things. (Choose "Product > Profile..." in Xcode)
If you want something more fine-grained, you could instrument your code. Coincidentally, I wrote a set of profiling macros just for such an occasion :)
https://github.com/nielsbot/Profiler
This will show a nice nested print out of time spent in instrumented routines. (instructions on that page)
Did you try kcachgrind: http://kcachegrind.sourceforge.net/html/Home.html with valgrind ?
I can recommed http://www.scalasca.org/ . You can use it also for the parallel performance afterwards.
Don't look for "slow functions" and don't look to measure the time used by different pieces. Those concepts are so indirect as to be almost useless for telling you what to optimize.
Instead, take some stroboscopic X-rays, on wall-clock time, of what the entire program is doing, and study each one to see why the program is spending that instant of time.
The reason this works better is it's not looking with function-colored glasses. It's looking with purpose-colored glasses, and you can tell if the program needs to be doing what it's doing.
It's very accurate about locating big problems.
It is not accurate about measuring them, nor does it need to be.
What happens when you just do measuring is this: You get a bunch of numbers on a bunch of routines. You look at them and say "what does it mean?".
If it doesn't tell you what you should fix, you pat yourself on the back and say the program must be optimal.
In fact, there probably is something you could fix, but you couldn't figure it out from the profiler.
What's more, if you do find it and fix it, it can expose other things you can fix for even greater speedup.
That's what random pausing is about.

Locate the most costly methods and evaluate/profile them

I would like to know if there is a technique or a tool available that can tell you how much time is required to execute a particular method.
Something similar to the big O notation in math/computer science that can give you an idea about the complexity of an algorithm, I wonder if there is something similar for code analysis.
Profiling is a means of analysing a program to determine the relative amount of time spent in particular functions or methods. It is useful for discovering performance issues in a program empirically. Using GCC, for example, you can:
Compile the program with the -pg option to enable profiling.
Run the executable to produce a file called gmon.out, which contains information about the runtime characteristics of your program as it actually ran.
Run gprof to display the information generated by the instrumented executable.
Generally speaking, human analysis is the only way to discover the asymptotic (i.e., big-O) complexity of a particular algorithm—there is, so far as I know, no mechanical way to do it.
If you want to know how much time is spent in a function, use a so-called "profiler".
Complexity analysis is outside the scope of a profiler, though, since a profiler tells you what happens when you run a program once, whereas complexity tells you the limit behavior of what happens when you run an infinite sequence of programs with bigger and bigger input.
So: do you want to know which functions are most costly in your program (in which case find a profiler for your C++ implementation and follow its documentation), or do you want to know about time complexity (in which case you pretty much need a human to analyze your code)?
You should try running your code with callgrind started, it will record what functions are called, how many times, but the code will run 20X slower or such. After you get your callgrind output you should open it with kcachegrind to view the tree structure of the calls. There you can browse around and see where do you have bottlenecks
Useful links:
kcachegrind http://kcachegrind.sourceforge.net/html/Home.html
callgrind documentation http://valgrind.org/docs/manual/cl-manual.html
(valgrind is the framework, callgrind is a component)
how to start callgrind if the program spawns processes and you want to profile those too (replace sage bench.py with your program) https://github.com/titusnicolae/pynac-callgrind/blob/master/run.sh
edit: "--instr-atstart=no" should be removed from the parameter list if you don't plan on enabling the instrumentation later
How is this different from the halting problem?
Please note that I can trivially solve the halting problem using your automatic complexity analyzer -- your problem is HARDER. And the halting problem is already undecidable.

Speed - embedding python in c++ or extending python with c++

I have some big mysql databases with data for calculations and some parts where I need to get data from external websites.
I used python to do the whole thing until now, but what shall I say: its not a speedster.
Now I'm thinking about mixing Python with C++ using Boost::Python and Python C API.
The question I've got now is: what is the better way to get some speed.
Shall I extend python with some c++ code or shall I embedd python code into a c++ programm?
I will get fore sure some speed increment using c++ code for the calculating parts and I think that calling the Python interpreter inside of an C-application will not be better, because the python interpreter will run the whole time. And I must wrap things python-libraries like mysqldb or urllib3 to have a nice way to work inside c++.
So what whould you suggest is the better way to go: extending or embedding?
( I love the python language, but I'm also familiar with c++ and respect it for speed )
Update:
So I switched some parts from python to c++ and used multi threading (real one) in my c modules and my programm now needs instead of 7 hours 30 minutes :))))
In principle, I agree with the first two answers. Anything coming from disk or across a network connection is likely to be a bigger bottleneck than the application.
All the research of the last 50 years indicates that people often have inaccurate intuition about system performance issues. So IMHO, you really need to gather some evidence, by measuring what is actually happening, then chose a solution based on that evidence.
To try to confirm what is causing the slow performance, measure the system and user time of your application (e.g time python prog.py), and measure the load on the machine.
It the application is maxing-out the CPU, and most of that time is spent in the application (user time), then there may be a case for using a more effective technology for the application.
But if the CPU is not maxed, or the application spends most of its time in the system (system time), and not in the application (user time), then it is unlikely that changing the application programming technology will help significantly. (This is an example of Amdahl's Law http://en.wikipedia.org/wiki/Amdahl%27s_law)
You may also need to measure the performance of your database server, and maybe network connection, to identify the source of the bottle neck, but start with the easiest part.
In my opinion, in your case it makes no sense to embed Python in C++, while the reverse could be beneficial.
In most of programs, the performance problems are very localized, which means that you should rewrite the problematic code in C++ only where it makes sense, leaving Python for the rest.
This gives you the best of both world: the speed of C++ where you need it, the ease of use and flexibility of Python everywhere else. What is also great is that you can do this process step by step, replacing the slow code paths by the by, leaving you always with the whole application in an usable (and testable!) state.
The reverse wouldn't make sense: you'd have to rewrite almost all the code, sacrificing the flexibility of the Python structure.
Still, as always when talking about performance, before acting measure: if your bottleneck is not CPU/memory bound switching to C++ isn't likely to produce much advantages.

Price of switching control between C++ and Python

I'm developing a C++ application that is extended/ scriptable with Python. Of course C++ is much faster than Python, in general, but does that necessarily mean that you should prefer to execute C++ code over Python code as often as possible?
I'm asking this because I'm not sure, is there any performance cost of switching control between code written in C++ and code written in Python? Should I use code written in C++ on every occasion, or should I avoid calling back to C++ for simple tasks because any speed gain you might have from executing C++ code is outmatched by the cost of switching between languages?
Edit: I should make this clear, I'm not asking this to actually solve a problem. I'm asking purely out of curiosity and it's something worth keeping in mind for the future. So I'm not interested in alternative solutions, I just want to know the answer, from a technical standpoint. :)
I don't know there is a concrete rule for this, but a general rule that many follow is to:
Prototype in python. This is quicker to write, and may be easier to read/reason about.
Once you have a prototype, you can now identify the slow portions that should be written in c++ (through profiling).
Depending on the domain of your code, the slow bits are usually isolated to the 'inner loop' types of code, so the number of switches between python an this code should be relatively small.
If your program is sufficiently fast, you've successfully avoided prematurely optimizing your code by writing too much in c++.
Keep it simple and tune performance as needed. The primary reason for embedding an interpreter in a C++ app is to allow run-time configuration/data to specify some processing - i.e. you can modify the script without recompiling the C++ program - that's your guide for when to call into the interpreter. Once in some interpreter call, the primary reasons to call back into C++ are:
to access or update some data that can't reasonably be exposed as a parameter to the call (or via some other registration process the interpreter supports)
to get better performance during some critical part of the processing
For the latter, try the script first (assuming it's as easy to develop there), then if it's slow identify where and how some C++ code might help. If/where performance does prove a problem - as a general guideline when calling from C++ to the interpreter or vice versa: try to line up as much work as possible then make the call into the other system. If you get stuck, come back to stackoverflow with a specific problem and actual code.
The cost is present but negligible. That's because you probably do a fair bit of work converting python's high level datatypes to C++-compatible representations. Of course this is similar to the cost of calling one C++ function from another, there's some overhead. The rules for when it's a good idea to switch from python to C++ are:
A function with few arguments
A function which does a large amount of processing on a small amount of data
A function which is called as rarely as possible - consolidate function calls if possible
The best metric should be something that wieghs up for you....
Makes development, debugging and testing easier (lowers dev cost)
Lowers the cost of maintenance
meets the performance requirement (provides solution)

What is profiling?

I am new to this and is trying to learn.
What is profiling?
What are various free tools for profiling .NET, Java EE?
Can Javascript be profiled?
If so, by which tool?
And lastly, how do these profilers work?
Profiling measures how long various parts of the code take to run. Javascript can be profiled with firebug: http://getfirebug.com/js.html
profiling is measuring the execution times and correlating it with various classes/methods/functions. (see the link I gave to the wikipedia page for some commentary on how profilers can work)
Think of profilers as debuggers for execution duration bugs.
Profilers are implemented a lot like debuggers too, except that rather than allowing you to stop the program and poke around, they simply let it run and keep track of how much time gets spent in every part of the program. This is particularly useful if you have some code that is running slower than you need it to run, as you can figure out exactly where all the time is going, and concentrate your efforts on fixing just that bottleneck.
Many developers believe you should never hand-optimize code without using a profiler.
The way you would usually use your profiler is as follows:
Start the profiler, fire up your application using the profiler.
Use your application for some time or just the features in your application that you have identified as bottlenecks and would like to optimize.
Once your application is closed (or sometimes even before that), the profiler can present you a breakdown of execution times per function. Some will also allow you to get a breakdown of execution times per line or function within one of these functions so you can see where cpu most time was used up using a top-down approach.
Usually some functions in your application will take an unusually long time to execute. After looking at your profiling results, you should be able to identify them and eliminate performance problems.
Here are some .NET profilers for you to try (free):
Prof-it
NProf
CLR Profiler
I am not a big fan of these. I would recommend one of the commercial products to get the best results:
dotTrace
Ants
Other than that take a look at Brad Adams blog posts Profilers for the CLR and .NET Application Profiler.
I personally like dotTrace.
Profiling is a technique for measuring execution times and numbers of invocations of procedures.
It is not however the only or even necessarily the best way to locate things that cause time to be wasted in your code. Look here.
For a different Wikipedia article, try http://en.wikipedia.org/wiki/Performance_tuning#Bottlenecks
For a simple how-to, try http://www.wikihow.com/Optimize-Your-Program%27s-Performance
Wikipedia says:
In software engineering, performance analysis, more commonly today known as profiling, is the investigation of a program's behavior using information gathered as the program executes
Continue reading here http://en.wikipedia.org/wiki/Performance_analysis.
So, about javascript tool Firebug(http://getfirebug.com/index.html#install) is an excelent option.
Profiling is a measure of execution time at method level (functional statistics) as well as run-time level information collection such as consumption of memory, processor, threads and number of classes (non-functional statistics) loaded over a period of time the application is running. It falls under performance analysis (functional and non-functional statistics collection) of the application in question as run by one user. JConsole is one of the built-in tools to profile Java applications.
Profiling or programming profiles is the technique of dynamic analysis of programs, which uses resources such as memory space or the temporal complexity of a program, the use of particular instructions or the frequency, as well as the duration of function calls , to mention a few cases. Typically, profiling information is used to aid program optimization and, more specifically, performance engineering. Profiling is accomplished by instrumenting the program's source code. Profilers employ different methods such as event-based, statistical, instrumented, and simulation methods