How to interpret results from kcachegrind - c++

Could anyone tell me how to interest the results from kcachegrind.
I had two versions of my code (v1, v2) both compiled in debug mode. I ran them through valgrind with options:
valgrind --tool=callgrind -v ....
The output files thus generated are opened in kcachegrind. Now I already found the version v2 of the code runs more faster than first version, v1 as it meant to be. But how do i inperet a result from kcachegrind's call graph.
In kcachegrind All Callers tab, I have the following columns: Incl. , Distance, Called, Caller.
IIUC, Called and caller are the no of times the 'caller' was called in the program. But I dont know about others.
Another thing is when selecting a particular function and then
the 'callers' tab it shows some more information. Ir, Ir per call, count, caller
and in the types tab: `EventType, Incl. Self, short, Formula.
I dont have any idea here.
So far I had read these questions:
KCachegrind interpretation confusion
Confused about profiling result

I use QCacheGrind, so I apologize if something on my screen isn't quite the same as what you see. From what I understand, QCacheGrind is a direct Qt port of KCacheGrind. Additionally, I have the ability to toggle between an Instruction Count and a % of total instructions. For consistency I will refer to the Instruction Count view on any column that can be toggled in this way.
The "All Callers" tab columns should represent the following:
Incl.: The number of instructions that this function generated as a whole broken down by each caller. Because callers are a hierarchy (hence the distance column) there may be several that have the same value if your call stack is deep.
Distance: How many function calls separated is the selected line from the function that is selected in the Flat Profile panel.
Called: The number of time the Caller called the a function that ultimately led to the execution of the selected function).
Caller: The function that directly called or called another caller of your selected function (as determined by Distance).
The Callers tab is more straightforward. It shows the functions that have a distance of 1 from your selected function. In other words, these are the functions that directly invoke your selected function.
Ir: The number of instructions executed in total by the selected function after being called by this caller.
Ir per call: The number of instructions executed per call.
Count: The number of times the selected function was called by the caller.
Caller: The function that directly called the selected function.
For Events, see this page for the handbook. I suspect that if you didn't define your own types all you should see is "Instruction Fetch" and possibly "Cycle Estimation." The quick breakdown on these columns is as follows:
Incl.: Again the total instructions performed by this function and all functions it calls beneath it.
Self: The instructions performed exclusively by this function. This counter only tracks instructions used by this function, not any instruction used by functions that are called by this function.
Short and Formula: These columns are used when defining a custom Event Type. Yours should either be blank or very short (like CEst = Ir) unless you end up defining your own Types.

Related

gcc gprof/gcov/other - how to get sequence of function calls/exits + control flow statements

BACKGROUND
We have testers for our embedded GUI product and when a tester declares "test failed", sometimes it's hard for us developers to reproduce the exact problem because we don't have the exact trace of what happened.
We do currently have a logging framework but us devs have to manually input those logging statements in the code which is fine . . . except when a hard-to-reproduce bug occurs and we didn't have a log statement at the 'right' location and then when we re-build, re-run the test with the same steps, we get a different result.
ISSUE
We would like a solution wherein the compiler produces extra instrumentation code that allows us to see the exact sequence of events including, at the minimum:
function enter/exit (already provided by -finstrument-functions)
control-flow statement enter i.e. entering if/else, which case statement we jumped to
The log would look like this:
int main() entered
if-line 5 entered
else-line 10 entered
void EventLoop() entered
. . .
Some additional nice-to-haves are
Parameter values on function entry AND exit (for pass-by-reference types)
Function return value
QUESTION
Are there any gcc tools or even paid tools that can do this instrumentation automatically?
You can either use gdb for that, and you can automate that (I've got a tool for that in the works, you find it here or you can try to use gcov.
The gcov implementation is so, that it loads the latest gcov data when you start the program. But: you can manually dump and load the data. If you call __gcov_flush it will dump the current data and reset the current state. However: if you do that several times it will always merge the data, so you'd also need to rename the gcov data file then.

gnuradio: how to change the noutput_items dynamically when writing OOT block?

When I make a OOT block in gnuradio
class mod(gr.sync_block):
"""
docstring for block mod
"""
def __init__(self):
gr.sync_block.__init__(self,
name="mod",
in_sig=[np.byte],
out_sig=[np.complex64])
def work(self, input_items, output_items):
in0 = input_items[0]
out = output_items[0]
result=do(....)
out[:]=result
return len(output_items[0])
I get:
ValueError: could not broadcast input array from shape (122879) into shape (4096)
How can I solve it?
GRC is as below:
selector :input index and output index are controlled by WX GUI Chooser block
FSK4 MOD: modulate fsk4 signal and write data to raw.bin
FSK4 DEMOD : read data from raw.bin and demodulate
file source -> /////// -> FSK4 MOD -> FSK4 DEMOD -> NULL SINK
selector
file source -> ////// -> GMKS MOD -> GMSK DEMOD ->NULL SINK
when the input index or output index is changed,the whole flow graph will be not responding.
There's two things:
You have a bug somewhere, and the solution is not to change something, but fix that bug. The full Python error message will tell you exactly in which line the error is.
noutput_items is a variable that GNU Radio sets at runtime to let you know how much output you might produce in this call to work. Hence, it's not something you can set, but it's something your work method must respect.
I think it's fair to assume that you're not very aware of how GNU Radio works:
GNU Radio is based on calling your block's work function when there is enough output space available and enough input items to process. The amount of output space that your block can use is passed to your work as a parameter, and will change between calls to work.
I very strongly recommend going through chapters 1-3 of the official Guided Tutorials if you haven't already. We always try to keep these tutorials up-to-date.
EDIT: Your command shows that you have not really understood what I meant, sorry. So: GNU Radio calls your work method over and over again while it's executing.
For example, it might call work with 4000 input items and 4000 output items space (you have a sync block, therefore number of input==number of output). Your work processes the first 1000 of that, and therefore return 1000. So there's 3000 items left.
Now, the upstream block does something, so there's 100 new items. Because the 3000 from before are still there, your block's work will get called with 3100 items.
Your work processes any number of items, and returns that number. GNU Radio makes sure that the "remaining" items stay available and will call your work again if there is enough in- our output.

Need an example of Ypsilon usage

I started to mess with Ypsilon, which is a C++ implementation of Scheme.
It conforms R6RS, features fast garbage collector, supports multi-core CPUs and Unicode but has a LACK of documentation, C++ code examples and comments in the code!
Authors provide it as a standalone console application.
My goal is to use it as a scripting engine in an image processing application.
The source code is well structured, but the structure is unfamiliar.
I spent two weeks penetrating it, and here's what I've found out:
All communication with outer world is done via C++ structures called
ports, they correspond to Scheme ports.
Virtual machine has 3 ports: IN, OUT and ERROR.
Ports can be std-ports (via console), socket-ports,
bytevector-ports, named-file-ports and custom-ports.
Each custom port must provide a filled structure called handlers.
Handlers is a vector containing 6 elements: 1st one is a boolean
(whether
port is textual), and other five are function pointers (onRead, onWrite, onSetPos, onGetPos, onClose).
As far as I understand, I need to implement 3 custom ports (IN, OUT and ERROR).
But for now I can't figure out, what are the input parameters of each function (onRead, onWrite, onSetPos, onGetPos, onClose) in handlers.
Unfortunately, there is neither example of implementing a custom port no example of following stuff:
C++ to Scheme function bindings (provided examples are a bunch of
.scm-files, still unclear what to do on the C++ side).
Compiling and
running bytecode (via bytevector-ports? But how to compile text to
bytecode?).
Summarizing, if anyone provides a C++ example of any scenario mentioned above, it would significantly save my time.
Thanks in advance!
Okay, from what I can read of the source code, here's how the various handlers get called (this is all unofficial, based purely on source code inspection):
Read handler: (lambda (bv off len)): takes a bytevector (which your handler will put the read data into), an offset (fixnum), and a length (fixnum). You should read in up to len bytes, placing those bytes into bv starting at off. Return the number of bytes actually read in (as a fixnum).
Write handler: (lambda (bv off len)): takes a bytevector (which contains the data to write), an offset (fixnum), and a length (fixnum). Grab up to len bytes from bv, starting at off, and write them out. Return the number of bytes actually written (as a fixnum).
Get position handler: (lambda (pos)) (called in text mode only): Allows you to store some data for pos so that a future call to the set position handler with the same pos value will reset the position back to the current position. Return value ignored.
Set position handler: (lambda (pos)): Move the current position to the value of pos. Return value ignored.
Close handler: (lambda ()): Close the port. Return value ignored.
To answer another question you had, about compiling and running "bytecode":
To compile an expression, use compile. This returns a code object.
There is no publicly-exported approach to run this code object. Internally, the code uses run-vmi, but you can't access this from outside code.
Internally, the only place where compiled code is loaded and used is in its auto-compile-cache system.
Have a look at heap/boot/eval.scm for details. (Again, this is not an official response, but based purely on personal experimentation and source code inspection.)

Trying to print the registers' values from the stack using a pin tool

I am trying to print out the stack in different routines using a pin tool. I am able to get all of the routines but I am a little confused on how to get the addresses stored in the registers in the stack of that routine.
What I have is this:
VOID SETRTN_CONTEXT(CONTEXT * ctxt)
{
ADDRINT reg_address;
PIN_SaveContext(ctxt, &m_ctxt);
reg_address = PIN_GetContextReg(&m_ctxt, REG_STACK_PTR);
}
and in another function I have this piece of code that calls that function:
for(rtn = SEC_RtnHead(sec); RTN_Valid(rtn); rtn = RTN_Next(rtn) )
{
RTN_Open(rtn);
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)SETRTN_CONTEXT,
IARG_CONST_CONTEXT, IARG_THREAD_ID, IARG_END);
RTN_Close(rtn);
}
I am a little confused on when the routine calls that function since I am only getting one result and I get it after attaching with Pin and waiting a couple of seconds.
Any pinheads that might help me on this one? I understand that I need the context from a routine in order to get the registers but I cannot find any function that returns the context as an object...
In your RTN_InsertCall, you add the thread id, and in your SETRTN_CONTEXT function declaration you don't receive the thread id... might want to fix that.
Also, in your analysis routine SETRTN_CONTEXT, you're not actually saving anything external to the application. I could be wrong if m_ctxt is a global variable that you're manipulating elsewhere, which how could that be sound unless you did that for every time the analysis routine ran and in a thread safe way?
Clearly, you want to write the information to some file or output. I recommend using some kind of xml tool; this makes it easy to parse, and if you write your pintools smartly, you can exchange the format of the output by obeying some interface contract.
Also to clarify your confusion, you try to insert the analysis routine to run before every single function in a particular image; every time that function is called in that image, your SETRTN_CONTEXT runs.

How to make cplex not output to terminal

I am using the IBM cplex optimizer to solve an optimization problem and I don't want all terminal prints that the optimizer does. Is there a member that turns this off in the IloCplex or IloModel class? These are the prints about cuts and iterations. Prints to the terminal are expensive and my problem will eventually be on the order of millions of variables and I don't want to waste time with these superfluous outputs. Thanks.
Using cplex/concert, you can completely turn off cplex's logging to the console with
cpx.setOut(env.getNullStream())
Where cpx is an IloCplex object. You can also use the setOut function to redirect logs to a file.
There are several cplex parameters to control what gets logged, for example MIPInterval will set the number of MIP nodes searched between lines. Turning MIPDisplay to 0 will turn off the display of cuts except when new solutions are found, while MIPDisplay of 5 will show detailed information about every lp-subproblem.
Logging-related parameters include MIPInterval MIPDisplay SimDisplay BarDisplay NetDisplay
You set parameters with the setParam function.
cpx.setParam(IloCplex::MIPInterval, 1000)