In my current project some part of code taking more than 30 minutes to complete the process. I found clock function would be best choice for getting the method execution time, but is there any other way to get the maximum time taking line of code? or else I have to log every method with clock function that would be a complex process for me because it is really gigantic project that would take forever to do it.
Proper way to do it - profiling. This will give you pretty useful information based on functions - where code spent most, which function was called most of the time etc. There are profilers that available on compiler level (like gcc has option to enable it) or you can use 3rd party ones. Unfortunately profiling itself will affect performance of the program and you may see different timing than real program with profiler enabled, but usually it is a good starting point.
As to measure execution time of every line of code, that is not practical. First of all not every line produces executable code especially after optimizer. On another side it is pretty useless to opimize code that not compiled with optimization enabled.
Related
I have a code with a loop that counts to 10000000000, and within that loop, I do some calculations with conditional operators (if etc). It takes about 5 minutes to reach that number. so, my question is, can I reduce the time it takes by creating a DLL and call that dll for functions to do the calculation and return the values to the main program? will it make a difference in time it takes to do the calculations? further, will it improve the overall efficiency of the program?
By a “dll” I assume you mean going from managed .net code to that of un-managed “native” compiled code. Yes this approach can help.
It much depends. Remember, the speed of the loop code is likely only 25 seconds on a typical i3 (that is the cost and overhead to loop to 10 billion but doing much nothing else).
And I assumed you went to the project, then compile. On that screen select advanced compile. There you want to check remove integer overflow checks. Make sure you loop vars are integers for speed.
At that point the “base” loop that does nothing will drop from about 20 seconds down to about 6 seconds.
So that is the base loop speed – now it come down to what we are doing inside of that loop.
At this point, .net DOES HAVE a JIT (a just in time native compiler). This means your source code goes to “CLR” code and then in tern that code does get compiled down to native x86 assembly code. So this “does” get the source code down to REAL machine code level. However a JIT is certainly NOT as efficient nor can it spend “time” optimizing the code since the JIT has to work on the “fly” without you noticing it. So a c++ (or VB6 which runs as fast as c++ when native compiled) can certainly run faster, but the question then is by how much?
The optimized compiler might get another double in speed for the actually LOOPING code etc.
However, in BOTH cases (using .net managed code, or code compiled down to native Intel code), they BOTH LIKELY are calling the SAME routines to do the math!
In other words, if 80% of the code is spend in “library” code that does the math etc., then calling such code from c++ or calling such code from .net will make VERY LITTLE difference since the BULK of the work is spend in the same system code!
The above concept is really “supervisor” mode vs. your application mode.
In other words, the amount of time spent in your code vs. that of using system “library” code means that the bulk of the heaving lifting is occurring in supervisor code. That means jumping from .net to native c++/vb6 dll’s will NOT yield much in the way of performance.
So I would first ensure loops and array index refs in your code are integer types. The above tip of taking off bounds checking likely will give you “close” to that of using a .dll. Worse is often the time to “shuffle” the data two and from that external.dll sub will cost you MORE than the time saved on the processing side.
And if your routines are doing database or file i/o, then all bets are off, as that is VERY different problem.
So I would first test/try your application with [x] remove integer overflow checks turned off. And make sure during testing that you use ctrl-F5 in place of F5 to run your code without DEBUGGING. The above overflow check and options will NOT show increased speed when in debug mode.
So it hard to tell – it really depends on how much math (especially floating calls) you are doing (supervisor code) vs. that of just moving values around in arrays. If more code is moving things around, then I suggest the integer optimizing above, and going to a .dll likely will not help much.
Couldn´t you utilize "Parallel.ForEach" and strip this huge loop in some equal pieces?
Or try to work with some Backgroundworkers or even Threads (more than 1!!) to achieve the optimal CPU performance and try to reduce the spent time.
I built a Fortran code with Intel 11.1. I built it with the -p option in order to produce profiling data. When I check these results, there are some routines present that aren't a part of my code. I assume they were put there by Intel. The include:
__powr8i4
__intel_new_memset
__intel_fast_memset
__intel_fast_memset.J
__intel_fast_memcpy
__intel_new_memcpy
__intel_fast_memcpy.J
There are others, too. When I build the code without optimization, the code doesn't spend much time in them. Except that results show __powr8i4 being used 3.3% of the time. However, when I build the code with optimization, this number goes way up to about 35%. I can't seem to find out what these routines are, but they are confusing my results because I want to know where to look to optimize my code.
Most programs spend a lot of their cycles in the calling of subroutines, often library subroutines, so if you look only at exclusive (self) time, you will see what you are seeing.
So point 1 is look at inclusive (self plus callees) time.
Now, if the profiler is a "CPU profiler", it will probably be blind to I/O time. That means your program might be spending most of its time reading or writing, but the profiler will give you no clue about that.
So point 2 is use a profiler that works on "wall clock" time, not "CPU" time, unless you are sure you are not doing much I/O. (Sometimes you think you're not doing I/O, but deep inside some subroutine layers deep, guess what - it's doing I/O.)
Many profilers try to produce a call-graph, and if your program does not contain recursion, and if the profiler has access to all the routines in your code, that can be helpful in identifying the subroutine calls in your code that account for a lot of time.
However, if routine A is large and calls B in several places, the profiler won't tell you which lines of code to look at.
Point 3 is use a profiler that gives you line-level inclusive time percentage, if possible.
(Percentage is the most useful number, because that tells you how much overall time you would save if you could somehow remove that line of code. Also, it is not much affected by competing processes in the system.)
One example of such a profiler is Zoom.
It may be that after you do all this, you don't see much you could do to speed up the code.
However, if you could see how certain properties of the data might affect performance, you might find there were further speedups you could get. Profilers are unable to look at data.
What I do is randomly sample the state of the program under the debugger, and see if I can really understand what it is doing at each sample.
You can find things that way that you can't find any other way.
(Some people say this is not accurate, but it is accurate - about what matters. What matters is what the problem is, not precisely how much it costs.)
And that is point 4.
I am doing a study to between profilers mainly instrumenting and sampling.
I have came up with the following info:
sampling: stop the execution of program, take PC and thus deduce were the program is
instrumenting: add some overhead code
to the program so it would increment
some pointers to know the program
If the above info is wrong correct me.
After this I was looking at the time of execution and some said that instrumenting takes more time than sampling! is this correct?
if yes why is that? in sampling you have to pay the price of context switching between processes while in the latter your in the same program no cost
Am i missing something?
cheers! =)
The interrupts generated by a sampling profiler generally add an insignficant amount of time to the total execution time, unless you have a very short sampling interval (e.g. < 1 ms).
With instrumented profiling there can be a large overhead, e.g. on small leaf functions that get called many times, as the calls to the instrumentation library can be significant compared to the execution time of the function.
It depends how conventional you want to be.
gprof does both those things you've mentioned. Here are some comments on that.
There is a school of thought that says profiling is about measuring. Measuring what? Well, anything - just measuring. Along with this goes the idea that what you want to get is a "big picture" of what's happening.
This school looks mostly at trying to find "slow functions", without clearly defining what that even means, and telling you to look there to optimize.
Another school says that you are really debugging. You want to precisely locate bugs of a certain kind - ones that don't make the program incorrect, rather they take too long. These are not big-picture things. They are very precise points in the code where something is happening that costs a lot more time than necessary.
Exactly how much more is not important. What's important is that it is located so it can be fixed.
In this viewpoint, profiling overhead is irrelevant, and so is accuracy of measurement.
What measuring is for is seeing how much time was saved.
One profiler that, I think, successfully spans both camps, is Zoom, because it samples the call stack, on wall-clock time, and presents, at the line/instruction level, percent of time on the stack. Some other profilers do this also, but most don't.
I'm in the second school, and here's an example of what you can accomplish with it.
Here's a more brief discussion of the issues.
I have a program I want to profile with gprof. The problem (seemingly) is that it uses sockets. So I get things like this:
::select(): Interrupted system call
I hit this problem a while back, gave up, and moved on. But I would really like to be able to profile my code, using gprof if possible. What can I do? Is there a gprof option I'm missing? A socket option? Is gprof totally useless in the presence of these types of system calls? If so, is there a viable alternative?
EDIT: Platform:
Linux 2.6 (x64)
GCC 4.4.1
gprof 2.19
The socket code needs to handle interrupted system calls regardless of profiler, but under profiler it's unavoidable. This means having code like.
if ( errno == EINTR ) { ...
after each system call.
Take a look, for example, here for the background.
gprof (here's the paper) is reliable, but it only was ever intended to measure changes, and even for that, it only measures CPU-bound issues. It was never advertised to be useful for locating problems. That is an idea that other people layered on top of it.
Consider this method.
Another good option, if you don't mind spending some money, is Zoom.
Added: If I can just give you an example. Suppose you have a call-hierarchy where Main calls A some number of times, A calls B some number of times, B calls C some number of times, and C waits for some I/O with a socket or file, and that's basically all the program does. Now, further suppose that the number of times each routine calls the next one down is 25% more times than it really needs to. Since 1.25^3 is about 2, that means the entire program takes twice as long to run as it really needs to.
In the first place, since all the time is spent waiting for I/O gprof will tell you nothing about how that time is spent, because it only looks at "running" time.
Second, suppose (just for argument) it did count the I/O time. It could give you a call graph, basically saying that each routine takes 100% of the time. What does that tell you? Nothing more than you already know.
However, if you take a small number of stack samples, you will see on every one of them the lines of code where each routine calls the next.
In other words, it's not just giving you a rough percentage time estimate, it is pointing you at specific lines of code that are costly.
You can look at each line of code and ask if there is a way to do it fewer times. Assuming you do this, you will get the factor of 2 speedup.
People get big factors this way. In my experience, the number of call levels can easily be 30 or more. Every call seems necessary, until you ask if it can be avoided. Even small numbers of avoidable calls can have a huge effect over that many layers.
I have a performance issue where I suspect one standard C library function is taking too long and causing my entire system (suite of processes) to basically "hiccup". Sure enough if I comment out the library function call, the hiccup goes away. This prompted me to investigate what standard methods there are to prove this type of thing? What would be the best practice for testing a function to see if it causes an entire system to hang for a sec (causing other processes to be momentarily starved)?
I would at least like to definitively correlate the function being called and the visible freeze.
Thanks
The best way to determine this stuff is to use a profiling tool to get the information on how long is spent in each function call.
Failing that set up a function that reserves a block of memory. Then in your code at various points, write a string to memory including the current time. (This avoids the delays associated with writing to the display).
After you have run your code, pull out the memory and parse it to deterimine how long parts of your code are taking.
I'm trying to figure out what you mean by "hiccup". I'm imagining your program does something like this:
while (...){
// 1. do some computing and/or file I/O
// 2. print something to the console or move something on the screen
}
and normally the printed or graphical output hums along in a subjectively continuous way, but sometimes it appears to freeze, while the computing part takes longer.
Is that what you meant?
If so, I suspect in the running state it is most always in step 2, but in the hiccup state it spending time in step 1.
I would comment out step 2, so it would spend nearly all it's time in the hiccup state, and then just pause it under the debugger to see what it's doing.
That technique tells you exactly what the problem is with very little effort.