Confused about profiling result - c++

I have built my program with "-g -O2" and ran valgrind+cachegrind. I am unsure how to interpret the output. Here is the output:
http://daviddoria.com/Uploads/callgrind.CacheMisses
My "whole program" is the InpaintingAlgorithm function that is 98.4% of "main". So far so good. Now looking at the callees of InpaintingAlgorithm, 92.9% of InpaintingAlgorithm is LinearSearchKNNProperty::operator(). This is my "inner loop", and again I expect a huge amount of the time to be spent here.
Now here is where I get confused. Looking at the callees of LinearSearchKNNProperty::operator(), there is really nothing there?? The largest function is only 7.64%, and the rest are < 0.25%. I don't understand how the sum of all of the callees only adds to about 8%. Where is the other 92%?? (Presumably the stuff I would be looking for to make it go faster!)
If anyone could point me to my error in reading these results, I would appreciate it!

Related

Writing out arrays fixes a problem for a reason unknown to me

When I run certain programs using FORTRAN 77 with the GNU gfortran compiler I have come across the same problem several times and I'm hoping someone has insight. The value I want should be ~ 1, but it writes out at the end of a program as something well over 10^100. This is generally a problem restricted to arrays for me. The improper value often goes away when I write out the value of this at some stage in the program before (which inevitably happens in trying to troubleshoot the issue).
I have tried initializing arrays explicitly and have tried some array bound checking as well as some internal logical checks in my program. The issue, to me and my research supervisor, seems to be pathological.
WRITE(*,*)"OK9999", INV,INVV
vs
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
The former gives incorrect results for what I would expect for the variables INV and INVV, the latter correct. This is the 'newest' example of this problem, which has on and off affected me for about a year.
The greater context of these lines is:
WRITE(*,*)"AFTER ENERGY",I1STOR(1),I2STOR(1)
DO 167 NP=1!,NV
IF(I1STOR(NP).NE.0) THEN
INV = I1STOR(NP)
INVV = I2STOR(NP)
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
PAUSE
ENDIF
I1STOR(1) and I2STOR(1) are written correctly in the first case "AFTER ENERGY" above. If I write out the value of NP after the DO 167 line, this will also remedy the situation.
My expectation would be that writing out a variable wouldn't affect its value. Often times I am doing large, time-intensive calculations where the ultimate value is way off and in many cases it has traced back to the situation where writing the value out (to screen or file) magically alleviates the problem. Any help would be sincerely appreciated.

can the return value from finish in gdb be different from the actual one in execution

I am a gdb novice, and I was trying to debug some GSSAPI code, and was using fin to see the return value from the frame. As seen in the snip pasted below, the call from gssint_mechglue_initialize_library() seems to be 0 but the actual check seems to fail. Can someone please point out if I am missing something obvious here?
Thanks in advance!
One possible explanation for the observed behavior is that you are debugging optimized code, and that line 1001 isn't really executed.
You can confirm this with a few nexts, or by executing fin again and observing whether GSS_S_COMPLETE or something else is returned from gssint_select_mech_type.
When optimization is on, code motion performed by the optimizer often prevents correct assignment of actual code sequences to line numbers (as instructions "belonging" to different lines are mixed and re-ordered). This often causes the code to "jump around" when e.g. doing nexti command.
For ease of debugging, recompile with -O0, or make sure to remove any -O2 and the like from your compile lines.

How to debug segmentation fault?

It works when, in the loop, I set every element to 0 or to entry_count-1.
It works when I set it up so that entry_count is small, and I write it by hand instead of by loop (sorted_order[0] = 0; sorted_order[1] = 1; ... etc).
Please do not tell me what to do to fix my code. I will not be using smart pointers or vectors for very specific reasons. Instead focus on the question:
What sort of conditions can cause this segfault?
Thank you.
---- OLD -----
I am trying to debug code that isn't working on a unix machine. The gist of the code is:
int *sorted_array = (int*)memory;
// I know that this block is large enough
// It is allocated by malloc earlier
for (int i = 0; i < entry_count; ++i){
sorted_array[i] = i;
}
There appears to be a segfault somewhere in the loop. Switching to debug mode, unfortunately, makes the segfault stop. Using cout debugging I found that it must be in the loop.
Next I wanted to know how far into the loop the segfault happend so I added:
std::cout << i << '\n';
It showed the entire range it was suppose to be looping over and there was no segfault.
With a little more experimentation I eventually created a string stream before the loop and write an empty string into it for each iteration of the loop and there is no segfault.
I tried some other assorted operations trying to figure out what is going on. I tried setting a variable j = i; and stuff like that, but I haven't found anything that works.
Running valgrind the only information I got on the segfault was that it was a "General Protection Fault" and something about default response to 11. It also mentions that there's a Conditional jump or move depends on uninitialized value(s), but looking at the code I can't figure out how that's possible.
What can this be? I am out of ideas to explore.
This is clearly a symptoms of invalid memory uses within your program.This would be bit difficult to find by looking out your code snippet as it is most likely be the side effect of something else bad which has already happened.
However as you have mentioned in your question that you are able to attach your program using Valgrind. as it is reproducible. So you may want to attach your program(a.out).
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Once you are able to figure it out your first error, fix it and rerun it and see what are other errors you are getting.This steps should be done till no error is getting reported by Valgrind.
However you should avoid using the raw pointers in modern C++ programs and start using std::vector std::unique_ptr as suggested by others as well.
Valgrind and GDB are very useful.
The most previous one that I used was GDB- I like it because it showed me the exact line number that the Segmentation Fault was on.
Here are some resources that can guide you on using GDB:
GDB Tutorial 1
GDB Tutorial 2
If you still cannot figure out how to use GDB with these tutorials, there are tons on Google! Just search debugging Segmentation Faults with GDB!
Good luck :)
That is hard, I used valgrind tools to debug seg-faults and it usually pointed to violations.
Likely your problem is freed memory that you are writing to i.e. sorted_array gets out of scope or gets freed.
Adding more code hides this problem as data allocation shifts around.
After a few days of experimentation, I figured out what was really going on.
For some reason the machine segfaults on unaligned access. That is, the integers I was writing were not being written to memory boundaries that were multiples of four bytes. Before the loop I computed the offset and shifted the array up that much:
int offset = (4 - (uintptr_t)(memory) % 4) % 4;
memory += offset;
After doing this everything behaved as expected again.

Why won't the python threads find the number?

I am trying to make a game where one person chooses a number and then two threads of computers try to find it with random guesses. I was trying to make the "AI" smarter by making a list of guesses the thread had already guesses so then it would cut down on total guesses. For some reason, the computers never find the number and my CPU usage spikes showing that they are looking for it. What is my flaw in the code? Thanks a bunch in advance! Also if there is any improvements to make the AI guess better would be appreciated.
Code (This is just the part where the computers guess, not the full thing):
def first_computer():
global computer1_score
computer1_score=0
computer1_guess= -1
one_guess_list=[]
while computer1_guess != pick_number:
computer1_guess=random.randint(1,1000000)
if computer1_guess in one_guess_list:
pass
else:
one_guess_list.append(computer1_guess)
computer1_score+=1
print computer_name_one.upper() +" got the answer in " + str(computer1_score) + " guesses"
def second_computer():
global computer2_score
computer2_score=0
computer2_guess= -1
two_guess_list=[]
while computer2_guess != pick_number:
computer2_guess=random.randint(1,1000000)
if computer2_guess in two_guess_list:
pass
else:
two_guess_list.append(computer2_guess)
computer2_score+=1
print computer_name_two.upper() +" got the answer in " + str(computer2_score) + " guesses"
Your basic algorithm would take a long time in any event: guess a random number with a one-in-1000000 chance of the guess being correct.
Your program makes things worse by your use of a list to store the previous guesses. The more failed guesses go into the list, the slower your program gets, as Python has to search through each list, looking at one number at a time. This is called an "O(n)" complexity algorithm; the set is called an "O(1)" complexity algorithm. The larger n becomes, the slower O(n) gets.
http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
You can speed your program up a bit by switching to a set rather than a list to store the list of incorrect guesses.
You can speed your program up a lot by giving your guesses a hint whether they were too low or too high, and using a binary search to close in on the number. Maybe that's cheating? I'm not sure what your goals are with this program.

Interpreting GPerfTools sample count

I'm struggling a little with reading the textual output the GPerfTools generate. I think part of the problem is that I don't fully understand how the sampling method operates.
From Wikipedia I gather that profilers based on sample functions usually work by sending an interrupt to the OS and querying the program's current instruction pointer. Now my knowledge about assembly is a little rusty, so I'm wondering what it means if the instruction pointer points to method m at any given time? I.e. does it mean that the function is about to be called or does it mean it's currently executed, or both?
There's a difference if I'm not mistaken, because in the first case the sample count (i.e. times m is seen while taking a sample) translates to the absolute call count of m, while in the latter case it simply translates to times seen, i.e. a mere indication of relative time spent in this method.
Can someone clarify?