Understanding why some functions are not profiled in Gprof? - c++

I am trying to use gprof and the legend reads for the calls column
calls the number of times this function was invoked, if
this function is profiled, else blank.
I have some functions for which that field is missing? What does it mean? I didn't give any special options.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
68.77 9.38 9.38 void BilinearForm::int3d...
27.71 13.16 3.78 void BilinearForm::int2d...
1.54 13.37 0.21 BilinearForm::finalize()
0.73 13.47 0.10 11275600 0.00 0.00 frame_dummy
...
This is how the head looks like.

Related

How to test the integrity of hardware on aws instance?

I have a cluster of consumers (50 or so instance) consuming from kafka partitions.
I notice that there is this one server that is consistently slow. Its cpu usage is always around 80-100%. While the other partitions is around 50%.
Originally I thought there is a slight chance that this is traffic dependent, so I manually switch the partitions that the slow loader is consuming.
However I did not observe an increase in processing speed.
I also don't see cpu steal from iostat, but since all consumer is running the same code I suspect there is some bottle neck in the hardware.
Unfortunately, I can't just replace the server unless I can provide conclusive proof that the hardware is the problem.
So I want to write a load testing script that pin point the bottle neck.
My plan is to write a while loop in python that does n computations, and find out what is the max computation that the slow consumer can do and what is the max computation that the fast consumer can do.
What other testing strategy can I do?
Perhaps I should test disk bottle neck by having my python script write to txt file?
Here is fast consumer iostat
avg-cpu: %user %nice %system %iowait %steal %idle
50.01 0.00 3.96 0.13 0.12 45.77
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1.06 0.16 11.46 422953 30331733
xvdb 377.63 0.01 46937.99 35897 124281808572
xvdc 373.43 0.01 46648.25 26603 123514631628
md0 762.53 0.01 93586.24 22235 247796440032
Here is slow consumer iostat
avg-cpu: %user %nice %system %iowait %steal %idle
81.58 0.00 5.28 0.11 0.06 12.98
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1.02 0.40 13.74 371145 12685265
xvdb 332.85 0.02 40775.06 18229 37636091096
xvdc 327.42 0.01 40514.44 10899 37395540132
md0 676.47 0.01 81289.50 11287 75031631060

Visual Studio profiler too much noise in results

Can I restrict the results in Visual Studio? I just want to see results for code that I have written. I am following the "500 line openGL" tutorial and I am jealous of his results in the tutorial.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
69.16 2.95 2.95 3000000 0.00 0.00 line(int, int, int, int, TGAImage&, TGAColor)
19.46 3.78 0.83 204000000 0.00 0.00 TGAImage::set(int, int, TGAColor)
8.91 4.16 0.38 207000000 0.00 0.00 TGAColor::TGAColor(TGAColor const&)
1.64 4.23 0.07 2 35.04 35.04 TGAColor::TGAColor(unsigned char, unsigned char, unsigned char, unsigned char)
0.94 4.27 0.04 TGAImage::get(int, int)
Here are my results below :( I am using Visual Studio Community 2017. I am using instrumentation because setup by the "Performance Wizard". If I just use "CPU Usage" I get an invalid process error. I read that this happens because my program exits too fast.

how to print out each class's Average Precision using Mxnet Faster RCNN for object detection

I am using faster rcnn(mxnet) for object detection on my own dataset, which has 9 classes(including background). However, i found that in the end it only prints out the average accuracy over all 9 classes during trainning process. Furthermore, during test process, it also only prints out the average precision and recall over all 9 classes. I am wondering how can i print out each class's accuracy during the training process, and each class's recall and precision during the test process?
Or can someone tell me where should i look at to approach my goal?
An ideal example will be shown in the image. enter image description here
You can use the Scikit-learn function sklearn.metrics.precision_recall_fscore_supportĀ¶ for this. And sklearn.metrics.classification_report for a prettified version.
At test time, you will have an array of true values (Y_true) and an array of predicted probabilities for each class (Y_prob). Use these as follows;
Y_pred = np.argmax(Y_prob, axis=1)
print(classification_report(Y_true, Y_pred))
precision recall f1-score support
class 0 0.50 1.00 0.67 1
class 1 0.00 0.00 0.00 1
class 2 1.00 0.67 0.80 3
avg / total 0.70 0.60 0.61 5
Slightly more work is required for these to be listed every N batches at training time. You can set a callback argument, and a custom eval_metric if you're using module.fit method;
model = mx.mod.Module(symbol=...)
model.fit(..., batch_end_callback = mx.callback.Speedometer(batch_size),
eval_metric=custom_metric, ...)
You'll need to create a new class for custom_metric that extends mxnet.metric.EvalMetric and implements a get method that prints out (or even returns) the per class metrics.

How to generate ocamlprof.dump by ocamlcp or ocamloptp

I read the manual about profiling (ocamlprof): http://caml.inria.fr/pub/docs/manual-ocaml-4.01/profil.html
I have a hard time to use it. The way I tried to do an example with gprof is:
For example I have a file name: ex.ml
I run: sudo ocamlopt -p ex.ml -o ex
then I use: gprof ex > profile.txt
It shows me a bunch of information but the column related to time is all 0
For instance (this taken from my real function):
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 415 0.00 0.00 caml_page_table_modify
0.00 0.00 0.00 57 0.00 0.00 caml_get_exception_backtrace
I don't understand why at the column time all functions return 0.00.
In the link above there is a file ocamlprof.dump, I don't know how to write a command generate it. How can I generate ocamlprof.dump. How can I know the locate of a name for example :caml_page_table_modify ?
Thank you very much for your help.

MinGW gprof inaccurate results?

I've been profiling a program with gprof on Linux (Ubuntu 11.04) and Windows (7, latest version of MinGW), same program on more or less the same dataset each time, and getting significantly different results. (Significantly as in they would lead to different conclusions about what part of the code needs optimizing.)
It's possible that the results could be legitimately different on the two systems, but I also have to consider the possibility that one result set is inaccurate and should be ignored, and a priori the more likely one would be MinGW as gprof is less extensively tested on Windows than on Linux. A stronger argument for that conclusion is that the results on Windows look distinctly weird:
% cumulative self self total
time seconds seconds calls us/call us/call name
27.43 1.13 1.13 68589813 0.02 0.02 addt
21.48 2.02 0.89 tok
19.17 2.81 0.79 hash
9.95 3.21 0.41 slot
7.89 3.54 0.33 nextx
4.85 3.74 0.20 next
3.52 3.88 0.14 27809047 0.01 0.01 get
0.85 3.92 0.04 eol
0.73 3.95 0.03 __mingw_pformat
0.73 3.98 0.03 ch
0.73 4.01 0.03 tokx
0.49 4.03 0.02 slot
0.49 4.05 0.02 tok
0.24 4.06 0.01 166896 0.06 0.06 mk2
0.24 4.07 0.01 6693 1.49 1.49 initt
0.24 4.08 0.01 __pformat_putchars
0.24 4.09 0.01 hashs
0.24 4.10 0.01 pop
0.24 4.11 0.01 quoted
0.12 4.12 0.01 eat
0.12 4.12 0.01 expand
0.00 4.12 0.00 145841014 0.00 0.00 initparse
There are a lot of gaps, and then initparse, which is an initialization function called only once that calls almost nothing else, is reported as having been called one hundred and forty-five million times.
Should I disregard the results from Windows and just use the ones from Linux? Or is there some issue with the reporting of number of calls on Windows that doesn't affect the percentage time results? Or am I misreading the output or otherwise misusing the tool?