Code slower despite what gprof says - c++

I've been given a c++ code to optimize, and the first step is to introduce parallelism with OpenMP. I was able to identify several functions that badly needed optimization, so I focused on them.
The problem is that the execution time has been multiplied by about 2, when the profiling files seems to tell me that it should be much faster ..
Here are the gprof profile I get without using OpenMP :
38.07 5.55 5.55 __tcf_0
20.99 8.61 3.06 86196302 0.04 0.04 is_neighbor(int, int, int, int, double)
13.24 10.54 1.93 425940 4.53 4.53 Ellips::data_fiting(double*, int, int, double) const
9.05 11.86 1.32 _fu51___ZSt4cout
5.90 12.72 0.86 5645243 0.15 0.15 Ellips::Ellips()
3.70 13.26 0.54 4013067 0.13 0.13 intersect(Ellips&, Ellips&)
2.40 13.61 0.35 dgemv_
1.44 13.82 0.21 ddot_
1.23 14.00 0.18 141257881 0.00 0.00 Configuration::get_position(int)
1.03 14.15 0.15 __tcf_0
0.82 14.27 0.12 594893 0.20 0.20 Ellips::Ellips(double, double, int, int)
0.41 14.33 0.06 7099 8.45 400.75 Configuration::Configuration(double, double, int, int, int, int, double*, double)
0.34 14.38 0.05 3203279 0.02 0.02 Ellips::operator=(Ellips const&)
0.34 14.43 0.05 ceil
0.21 14.46 0.03 dnrm2_
0.14 14.48 0.02 _fu32___ZSt4cout
0.14 14.50 0.02 dcopy_
0.14 14.52 0.02 dscal_
0.07 14.53 0.01 7775127 0.00 0.00 Configuration::get_Ellips(int)
0.07 14.54 0.01 6239588 0.00 0.00 Ellips::~Ellips()
0.07 14.55 0.01 4349523 0.00 0.00 Configuration::get_data_fit(int)
0.07 14.56 0.01 7097 1.41 1.41 Graph<float, float, float>::maxflow(bool, Block<int>*)
0.07 14.57 0.01 _fu53___ZNSs4_Rep20_S_empty_rep_storageE
0.07 14.58 0.01 floor
0.00 14.58 0.00 432232036 0.00 0.00 Configuration::save_config(std::string)
0.00 14.58 0.00 1180034 0.00 0.00 Ellips::data_fiting(double, double*, double*, double, int, int, double) const
0.00 14.58 0.00 1173980 0.00 0.00 Ellips::get_cx() const
0.00 14.58 0.00 1164513 0.00 0.02 Configuration::add_Ellips(Ellips const&, int, double)
0.00 14.58 0.00 1157360 0.00 0.00 Ellips::get_cy() const
0.00 14.58 0.00 425940 0.00 0.00 shift_cost_exp1(double, double)
0.00 14.58 0.00 23625 0.00 0.00 Graph<float, float, float>::augment(Graph<float, float, float>::arc*)
0.00 14.58 0.00 22504 0.00 0.00 Graph<float, float, float>::process_sink_orphan(Graph<float, float, float>::node*)
0.00 14.58 0.00 21293 0.00 27.35 Configuration::operator=(Configuration const&)
0.00 14.58 0.00 14203 0.00 0.23 Configuration::~Configuration()
0.00 14.58 0.00 14196 0.00 0.00 Configuration::get_nb_Ellipses()
0.00 14.58 0.00 7097 0.00 34.30 Configuration::Configuration(Ellips const&, int, double, int)
0.00 14.58 0.00 7097 0.00 0.00 Graph<float, float, float>::maxflow_init()
0.00 14.58 0.00 7097 0.00 0.00 Graph<float, float, float>::reset()
0.00 14.58 0.00 2406 0.00 0.00 Ellips::get_a() const
0.00 14.58 0.00 2406 0.00 0.00 Ellips::get_b() const
0.00 14.58 0.00 2406 0.00 0.00 Ellips::get_theta() const
0.00 14.58 0.00 1137 0.00 0.00 Graph<float, float, float>::process_source_orphan(Graph<float, float, float>::node*)
0.00 14.58 0.00 7 0.00 38.00 Configuration::Configuration(Configuration const&)
0.00 14.58 0.00 3 0.00 0.32 Configuration::Configuration()
0.00 14.58 0.00 2 0.00 0.00 min_max_val(_IplImage*, double&, double&)
0.00 14.58 0.00 1 0.00 0.00 convert_char_to_double(_IplImage*, double*)
0.00 14.58 0.00 1 0.00 0.00 Graph<float, float, float>::reallocate_nodes(int)
0.00 14.58 0.00 1 0.00 0.00 Graph<float, float, float>::Graph(int, int, void (*)(char*))
And here is the one I get with OpenMP (The code is a recursive algorithm with no real "ending", the two profiles have been obtained after about 7000 iterations of the main loop).
36.57 4.45 4.45 __tcf_0
25.72 7.58 3.13 86434458 0.04 0.04 is_neighbor(int, int, int, int, double)
12.41 9.09 1.51 _fu51___ZSt4cout
7.97 10.06 0.97 5646276 0.17 0.17 Ellips::Ellips()
4.35 10.59 0.53 4020048 0.13 0.13 intersect(Ellips&, Ellips&)
2.47 10.89 0.30 dgemv_
1.73 11.10 0.21 ddot_
1.64 11.30 0.20 141852099 0.00 0.00 Configuration::get_position(int)
1.15 11.44 0.14 7038 19.89 164.95 Configuration::Configuration(double, double, int, int, int, int, double*, double)
1.07 11.57 0.13 589659 0.22 0.22 Ellips::Ellips(double, double, int, int)
0.99 11.69 0.12 __tcf_0
0.90 11.80 0.11 422280 0.26 0.33 Ellips::data_fiting(double*, int, int, double) const
0.74 11.89 0.09 3208793 0.03 0.03 Ellips::operator=(Ellips const&)
0.41 11.94 0.05 ceil
0.25 11.97 0.03 422280 0.07 0.07 shift_cost_exp1(double, double)
0.25 12.00 0.03 GOMP_parallel_end
0.25 12.03 0.03 _fu53___ZNSs4_Rep20_S_empty_rep_storageE
0.16 12.05 0.02 21110 0.95 32.56 Configuration::operator=(Configuration const&)
0.16 12.07 0.02 7036 2.84 2.84 Graph<float, float, float>::maxflow(bool, Block<int>*)
0.16 12.09 0.02 _fu32___ZSt4cout
0.16 12.11 0.02 daxpy_
0.16 12.13 0.02 dnrm2_
0.08 12.14 0.01 1171018 0.01 0.04 Configuration::add_Ellips(Ellips const&, int, double)
0.08 12.15 0.01 GOMP_parallel_start
0.08 12.16 0.01 dcopy_
0.08 12.17 0.01 dgemm_
0.00 12.17 0.00 432088679 0.00 0.00 Configuration::save_config(std::string)
0.00 12.17 0.00 7813683 0.00 0.00 Configuration::get_Ellips(int)
0.00 12.17 0.00 6235383 0.00 0.00 Ellips::~Ellips()
0.00 12.17 0.00 4360587 0.00 0.00 Configuration::get_data_fit(int)
0.00 12.17 0.00 1187310 0.00 0.00 Ellips::data_fiting(double, double*, double*, double, int, int, double) const
0.00 12.17 0.00 1163572 0.00 0.00 Ellips::get_cx() const
0.00 12.17 0.00 1147536 0.00 0.00 Ellips::get_cy() const
0.00 12.17 0.00 35748 0.00 0.00 Graph<float, float, float>::augment(Graph<float, float, float>::arc*)
0.00 12.17 0.00 33436 0.00 0.00 Graph<float, float, float>::process_sink_orphan(Graph<float, float, float>::node*)
0.00 12.17 0.00 14081 0.00 0.00 Configuration::~Configuration()
0.00 12.17 0.00 14074 0.00 0.00 Configuration::get_nb_Ellipses()
0.00 12.17 0.00 7036 0.00 39.10 Configuration::Configuration(Ellips const&, int, double, int)
0.00 12.17 0.00 7036 0.00 0.00 Graph<float, float, float>::maxflow_init()
0.00 12.17 0.00 7036 0.00 0.00 Graph<float, float, float>::reset()
0.00 12.17 0.00 2424 0.00 0.00 Ellips::get_a() const
0.00 12.17 0.00 2424 0.00 0.00 Ellips::get_b() const
0.00 12.17 0.00 2424 0.00 0.00 Ellips::get_theta() const
0.00 12.17 0.00 2355 0.00 0.00 Graph<float, float, float>::process_source_orphan(Graph<float, float, float>::node*)
0.00 12.17 0.00 7 0.00 44.91 Configuration::Configuration(Configuration const&)
0.00 12.17 0.00 3 0.00 0.37 Configuration::Configuration()
0.00 12.17 0.00 2 0.00 0.00 min_max_val(_IplImage*, double&, double&)
0.00 12.17 0.00 1 0.00 0.00 convert_char_to_double(_IplImage*, double*)
0.00 12.17 0.00 1 0.00 0.00 Graph<float, float, float>::reallocate_nodes(int)
0.00 12.17 0.00 1 0.00 0.00 Graph<float, float, float>::Graph(int, int, void (*)(char*))
Is there a problem with how I'm using the profiler ? Or does this come from the code itself ? It takes about 12 seconds to complete 1000 iterations with OpenMP, whereas it takes about 31 seconds with OpenMP (using omp_get_wtime() and not clock())

Related

Compilation time profiling: What is the "phase opt and generate" stage and how can I speed it up (-ftime-report)

I am profiling the compilation time of my code to determine why the compile time is so slow. I am using gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and have added the compiler flag -ftime-report.
What I notice is that the compilation units that are slow to compile spend a majority of time on the phase opt and generate stage. What exactly is this stage? How can I reduce the time taken by this phase.
For reference, this is what the output for one of the compilation units looks like.
Time variable usr sys wall GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1579 kB ( 0%)
phase parsing : 1.74 ( 20%) 0.71 ( 44%) 2.46 ( 24%) 311927 kB ( 36%)
phase lang. deferred : 1.33 ( 15%) 0.34 ( 21%) 1.67 ( 16%) 259524 kB ( 30%)
phase opt and generate : 5.68 ( 65%) 0.58 ( 36%) 6.26 ( 60%) 301021 kB ( 34%)
phase last asm : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 2 kB ( 0%)
|name lookup : 0.44 ( 5%) 0.12 ( 7%) 0.49 ( 5%) 15499 kB ( 2%)
|overload resolution : 0.76 ( 9%) 0.22 ( 13%) 0.92 ( 9%) 130607 kB ( 15%)
garbage collection : 0.33 ( 4%) 0.01 ( 1%) 0.34 ( 3%) 0 kB ( 0%)
dump files : 0.18 ( 2%) 0.04 ( 2%) 0.10 ( 1%) 0 kB ( 0%)
callgraph construction : 0.12 ( 1%) 0.03 ( 2%) 0.14 ( 1%) 6318 kB ( 1%)
callgraph optimization : 0.16 ( 2%) 0.04 ( 2%) 0.19 ( 2%) 82 kB ( 0%)
ipa function summary : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 2289 kB ( 0%)
ipa dead code removal : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
ipa inheritance graph : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 29 kB ( 0%)
ipa virtual call target : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 3 kB ( 0%)
ipa cp : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1140 kB ( 0%)
ipa inlining heuristics : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 2438 kB ( 0%)
ipa function splitting : 0.00 ( 0%) 0.01 ( 1%) 0.01 ( 0%) 451 kB ( 0%)
ipa profile : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
ipa pure const : 0.02 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 40 kB ( 0%)
ipa icf : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 4 kB ( 0%)
ipa SRA : 0.10 ( 1%) 0.00 ( 0%) 0.05 ( 0%) 9838 kB ( 1%)
cfg cleanup : 0.08 ( 1%) 0.01 ( 1%) 0.08 ( 1%) 1621 kB ( 0%)
trivially dead code : 0.03 ( 0%) 0.00 ( 0%) 0.06 ( 1%) 0 kB ( 0%)
df scan insns : 0.02 ( 0%) 0.01 ( 1%) 0.05 ( 0%) 18 kB ( 0%)
df multiple defs : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
df reaching defs : 0.06 ( 1%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%)
df live regs : 0.19 ( 2%) 0.01 ( 1%) 0.25 ( 2%) 0 kB ( 0%)
df live&initialized regs : 0.05 ( 1%) 0.00 ( 0%) 0.06 ( 1%) 0 kB ( 0%)
df use-def / def-use chains : 0.03 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
df reg dead/unused notes : 0.08 ( 1%) 0.00 ( 0%) 0.07 ( 1%) 2152 kB ( 0%)
register information : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
alias analysis : 0.03 ( 0%) 0.00 ( 0%) 0.09 ( 1%) 5413 kB ( 1%)
alias stmt walking : 0.08 ( 1%) 0.00 ( 0%) 0.13 ( 1%) 738 kB ( 0%)
register scan : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 167 kB ( 0%)
rebuild jump labels : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
preprocessing : 0.15 ( 2%) 0.21 ( 13%) 0.39 ( 4%) 11918 kB ( 1%)
parser (global) : 0.29 ( 3%) 0.21 ( 13%) 0.51 ( 5%) 105494 kB ( 12%)
parser struct body : 0.18 ( 2%) 0.04 ( 2%) 0.22 ( 2%) 39504 kB ( 5%)
parser enumerator list : 0.01 ( 0%) 0.01 ( 1%) 0.00 ( 0%) 1305 kB ( 0%)
parser function body : 0.18 ( 2%) 0.04 ( 2%) 0.15 ( 1%) 9096 kB ( 1%)
parser inl. func. body : 0.27 ( 3%) 0.02 ( 1%) 0.39 ( 4%) 33105 kB ( 4%)
parser inl. meth. body : 0.21 ( 2%) 0.06 ( 4%) 0.25 ( 2%) 23541 kB ( 3%)
template instantiation : 1.61 ( 18%) 0.43 ( 26%) 2.05 ( 20%) 346006 kB ( 40%)
constant expression evaluation : 0.05 ( 1%) 0.03 ( 2%) 0.02 ( 0%) 1470 kB ( 0%)
early inlining heuristics : 0.00 ( 0%) 0.01 ( 1%) 0.03 ( 0%) 3751 kB ( 0%)
inline parameters : 0.06 ( 1%) 0.02 ( 1%) 0.05 ( 0%) 12991 kB ( 1%)
integration : 0.12 ( 1%) 0.04 ( 2%) 0.26 ( 3%) 53810 kB ( 6%)
tree gimplify : 0.06 ( 1%) 0.02 ( 1%) 0.11 ( 1%) 20691 kB ( 2%)
tree eh : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 2821 kB ( 0%)
tree CFG construction : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 8987 kB ( 1%)
tree CFG cleanup : 0.11 ( 1%) 0.02 ( 1%) 0.13 ( 1%) 208 kB ( 0%)
tree tail merge : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 880 kB ( 0%)
tree VRP : 0.17 ( 2%) 0.00 ( 0%) 0.18 ( 2%) 7001 kB ( 1%)
tree Early VRP : 0.05 ( 1%) 0.00 ( 0%) 0.05 ( 0%) 7256 kB ( 1%)
tree copy propagation : 0.00 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 104 kB ( 0%)
tree PTA : 0.13 ( 1%) 0.05 ( 3%) 0.25 ( 2%) 1906 kB ( 0%)
tree PHI insertion : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 736 kB ( 0%)
tree SSA rewrite : 0.06 ( 1%) 0.01 ( 1%) 0.04 ( 0%) 6289 kB ( 1%)
tree SSA other : 0.00 ( 0%) 0.02 ( 1%) 0.03 ( 0%) 940 kB ( 0%)
tree SSA incremental : 0.08 ( 1%) 0.00 ( 0%) 0.03 ( 0%) 1717 kB ( 0%)
tree operand scan : 0.08 ( 1%) 0.00 ( 0%) 0.08 ( 1%) 19096 kB ( 2%)
dominator optimization : 0.18 ( 2%) 0.01 ( 1%) 0.15 ( 1%) 5240 kB ( 1%)
backwards jump threading : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 244 kB ( 0%)
tree SRA : 0.03 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1712 kB ( 0%)
tree CCP : 0.10 ( 1%) 0.02 ( 1%) 0.10 ( 1%) 1097 kB ( 0%)
tree reassociation : 0.00 ( 0%) 0.01 ( 1%) 0.00 ( 0%) 50 kB ( 0%)
tree PRE : 0.15 ( 2%) 0.01 ( 1%) 0.18 ( 2%) 4977 kB ( 1%)
tree FRE : 0.13 ( 1%) 0.02 ( 1%) 0.12 ( 1%) 2498 kB ( 0%)
tree linearize phis : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 563 kB ( 0%)
tree forward propagate : 0.09 ( 1%) 0.00 ( 0%) 0.10 ( 1%) 1071 kB ( 0%)
tree phiprop : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 11 kB ( 0%)
tree conservative DCE : 0.04 ( 0%) 0.01 ( 1%) 0.02 ( 0%) 133 kB ( 0%)
tree aggressive DCE : 0.04 ( 0%) 0.01 ( 1%) 0.04 ( 0%) 7238 kB ( 1%)
tree DSE : 0.00 ( 0%) 0.01 ( 1%) 0.03 ( 0%) 254 kB ( 0%)
tree loop invariant motion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 17 kB ( 0%)
scev constant prop : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 112 kB ( 0%)
tree loop unswitching : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 349 kB ( 0%)
complete unrolling : 0.01 ( 0%) 0.01 ( 1%) 0.03 ( 0%) 1141 kB ( 0%)
tree slp vectorization : 0.01 ( 0%) 0.02 ( 1%) 0.03 ( 0%) 5032 kB ( 1%)
tree iv optimization : 0.02 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 2110 kB ( 0%)
predictive commoning : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 302 kB ( 0%)
gimple CSE reciprocals : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
dominance computation : 0.14 ( 2%) 0.03 ( 2%) 0.16 ( 2%) 0 kB ( 0%)
out of ssa : 0.05 ( 1%) 0.00 ( 0%) 0.01 ( 0%) 55 kB ( 0%)
expand vars : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1422 kB ( 0%)
expand : 0.03 ( 0%) 0.01 ( 1%) 0.10 ( 1%) 14790 kB ( 2%)
post expand cleanups : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1273 kB ( 0%)
varconst : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 8 kB ( 0%)
jump : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
forward prop : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 1330 kB ( 0%)
CSE : 0.13 ( 1%) 0.00 ( 0%) 0.08 ( 1%) 664 kB ( 0%)
dead code elimination : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
dead store elim1 : 0.02 ( 0%) 0.00 ( 0%) 0.06 ( 1%) 1230 kB ( 0%)
dead store elim2 : 0.05 ( 1%) 0.00 ( 0%) 0.03 ( 0%) 1584 kB ( 0%)
loop init : 0.11 ( 1%) 0.02 ( 1%) 0.07 ( 1%) 8638 kB ( 1%)
loop versioning : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 40 kB ( 0%)
loop invariant motion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 8 kB ( 0%)
CPROP : 0.12 ( 1%) 0.00 ( 0%) 0.06 ( 1%) 3321 kB ( 0%)
PRE : 0.08 ( 1%) 0.00 ( 0%) 0.05 ( 0%) 935 kB ( 0%)
CSE 2 : 0.07 ( 1%) 0.00 ( 0%) 0.08 ( 1%) 333 kB ( 0%)
branch prediction : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 1178 kB ( 0%)
combiner : 0.21 ( 2%) 0.00 ( 0%) 0.15 ( 1%) 7070 kB ( 1%)
if-conversion : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 464 kB ( 0%)
integrated RA : 0.25 ( 3%) 0.01 ( 1%) 0.30 ( 3%) 20626 kB ( 2%)
LRA non-specific : 0.10 ( 1%) 0.00 ( 0%) 0.09 ( 1%) 1243 kB ( 0%)
LRA virtuals elimination : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 834 kB ( 0%)
LRA reload inheritance : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 195 kB ( 0%)
LRA create live ranges : 0.11 ( 1%) 0.01 ( 1%) 0.13 ( 1%) 234 kB ( 0%)
LRA hard reg assignment : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
LRA rematerialization : 0.04 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
reload : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
reload CSE regs : 0.09 ( 1%) 0.00 ( 0%) 0.06 ( 1%) 2212 kB ( 0%)
load CSE after reload : 0.06 ( 1%) 0.00 ( 0%) 0.05 ( 0%) 559 kB ( 0%)
ree : 0.00 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 71 kB ( 0%)
thread pro- & epilogue : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 939 kB ( 0%)
peephole 2 : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 170 kB ( 0%)
hard reg cprop : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 15 kB ( 0%)
scheduling 2 : 0.15 ( 2%) 0.00 ( 0%) 0.16 ( 2%) 894 kB ( 0%)
machine dep reorg : 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 502 kB ( 0%)
reorder blocks : 0.04 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1015 kB ( 0%)
shorten branches : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
final : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 3408 kB ( 0%)
straight-line strength reduction : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 21 kB ( 0%)
tree loop if-conversion : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 203 kB ( 0%)
rest of compilation : 0.10 ( 1%) 0.01 ( 1%) 0.13 ( 1%) 3241 kB ( 0%)
remove unused locals : 0.02 ( 0%) 0.00 ( 0%) 0.08 ( 1%) 3 kB ( 0%)
address taken : 0.04 ( 0%) 0.01 ( 1%) 0.04 ( 0%) 0 kB ( 0%)
TOTAL : 8.75 1.63 10.40 874064 kB
Edit
I had a few people comment asking for the compiler flags, here they are:
-std=c++17 -Wall -Ofast -DNDEBUG -Wno-deprecated-declarations

unexpected minflt(minor page fault)

minor page fault means:need virtual memory map real memory, but I found my test code has some min page fault when access used memory .
my test code :
#include <iostream>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
using namespace std;
int main() {
uint64_t len = 1024ll * 128; //128k
uint32_t times = 1000000000;
char *p = new char[len];
for (uint32_t t = 0; t < times; ++t) {
for (uint64_t i = 0; i < len; ++i) {
*(p + i) = 100;
}
}
delete[] p;
return 0;
}
pidstat:
03:40:05 PM UID PID minflt/s majflt/s VSZ RSS %MEM Command
03:40:06 PM 0 42379 34.00 0.00 12672 1196 0.00 main
03:40:07 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:08 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:09 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:10 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:11 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:12 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:13 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:14 PM 0 42379 34.00 0.00 12672 1196 0.00 main
03:40:15 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:16 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:17 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:18 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:19 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:20 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:21 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:22 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:23 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:24 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:25 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:26 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:27 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:28 PM 0 42379 34.00 0.00 12672 1196 0.00 main
03:40:29 PM 0 42379 0.00 0.00 12672 1196 0.00 main
03:40:30 PM 0 42379 0.00 0.00 12672 1196 0.00 main
minflt/s:34 = 34 * 4K = 136K
my array in test code = 128K
Why dost my test code product minflt when access used memory?

Finding the Mode in a Vector of Floats

I am trying to find the mode average in a vector containing 324 float values.
The code I have is as follows:
float max = vec.back();
float prev = max;
float mode = 0.0;
int maxcount = 0;
int currcount = 0;
for (const auto n : vec) {
if (n == prev) {
++currcount;
if (currcount > maxcount) {
maxcount = currcount;
mode = n;
}
} else {
currcount = 1;
}
prev = n;
}
std::cout << mode << std::endl
This prints out the mode to be 0.75, which is wrong.
Here are all the float values, they come from a txt file so please excuse the format:
0.61 0.61 0.61 0.62 0.62 0.62 0.62 0.62 0.62 0.62 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.74 0.74 0.74 0.74 0.74 0.74 0.74 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.77 0.77 0.77 0.77 0.77 0.77 0.77 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79
Excel presents the mode as 0.65. Why does my code not produce the same result? What do I need to change?
Many thanks.
edit: I have found through debugging the values within vec are more like; 0.68000000000000005, 0.69999999999999996, though some are still only two decimal points (0.64, 0.74 etc). Could this be the issue? Am I able to round up the values for this particular calc?
The problem might be the use of floats for comparison. Because of how they are stored, floating point numbers differ, in general, from the value they are initialized to by a small amount.
Instead of using n == prev, consider a comparison within some small epsilon that is greater than the machine precision (for any machine you expect to run this code on) but less than the smallest true difference between any of your two numbers (which looks like 0.01). So you could do
if (((n - prev) < EPSILON) && ((prev - n) < EPSILON)) { ...`
for float EPSILON = 0.000001, or a value that makes sense for you.
See also this question on comparing floats. Of note is that the ideal epsilon would change if your data set changed to much larger or much smaller numbers.
Even if there is another problem in your code, you might consider moving away from comparing floats in general.
By debugging I found that my values were not just two decimal place values, therefore, the mean average was actually 0.7500000000004, but was still being printed as 0.75.
By adding a rounding function call, and removing the const I was able to find the mean to two decimal places.
for (auto n : vec)
{
n = roundf(n * 100) / 100;
if (n == prev)
{
++currcount;
if (currcount > maxcount)
{
maxcount = currcount;
mode = n;
}
} else
{
currcount = 1;
}
prev = n;
}

Interpreting profiler log for STL based code

I did a generalization a an algorithm implementation.
Now, the new implementation runs more than 100 times slower than the old one.
My guess is that the source unnecessary use of inefficiency is implicit copy constructors that I somehow introduced. I tried to profile the code, but I get a lot of data that I do not understand. Do I really need to know STL internals to be able to profile STL based code?
Snippet of the flat profile:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
5.22 0.52 0.52 9092637 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_M_clear()
4.11 0.93 0.41 25264275 0.00 0.00 std::_List_node<unsigned int>* std::list<unsigned int, std::allocator<unsigned int> >::_M_create_node<unsigned int const&>(unsigned int const&)
3.66 1.29 0.36 9084123 0.00 0.00 void std::list<unsigned int, std::allocator<unsigned int> >::_M_initialize_dispatch<std::_List_const_iterator<unsigned int> >(std::_List_const_iterator<unsigned int>, std::_List_const_iterator<unsigned int>, std::__false_type)
3.11 1.60 0.31 25264275 0.00 0.00 std::_List_node<unsigned int>::_List_node<unsigned int const&>(unsigned int const&)
2.61 1.86 0.26 101061221 0.00 0.00 unsigned int const& std::forward<unsigned int const&>(std::remove_reference<unsigned int const&>::type&)
2.56 2.12 0.26 25264275 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >::push_back(unsigned int const&)
2.51 2.37 0.25 25264275 0.00 0.00 void std::list<unsigned int, std::allocator<unsigned int> >::_M_insert<unsigned int const&>(std::_List_iterator<unsigned int>, unsigned int const&)
2.41 2.61 0.24 9080201 0.00 0.00 std::vector<short, std::allocator<short> >::vector(std::vector<short, std::allocator<short> > const&)
2.21 2.83 0.22 9082855 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >::list(std::list<unsigned int, std::allocator<unsigned int> > const&)
2.16 3.04 0.21 25264275 0.00 0.00 void __gnu_cxx::new_allocator<std::_List_node<unsigned int> >::construct<unsigned int const&>(std::_List_node<unsigned int>*, unsigned int const&)
2.01 3.25 0.20 25270362 0.00 0.00 __gnu_cxx::new_allocator<std::_List_node<unsigned int> >::allocate(unsigned int, void const*)
1.76 3.42 0.17 9091186 0.00 0.00 std::vector<short, std::allocator<short> >::size() const
1.71 3.59 0.17 50552766 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_M_get_Node_allocator()
1.71 3.76 0.17 25270362 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_M_put_node(std::_List_node<unsigned int>*)
1.65 3.92 0.17 9084123 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_List_base(std::allocator<unsigned int> const&)
1.55 4.08 0.15 9055760 0.00 0.00 bool __gnu_cxx::operator!=<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > >(__gnu_cxx::__normal_iterator<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > > const&, __gnu_cxx::__normal_iterator<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > > const&)
1.50 4.23 0.15 25249466 0.00 0.00 std::_List_const_iterator<unsigned int>::operator++()
1.45 4.38 0.14 9084596 0.00 0.00 std::_Vector_base<short, std::allocator<short> >::~_Vector_base()
1.45 4.52 0.14 9051640 0.00 0.00 void std::_Construct<std::list<unsigned int, std::allocator<unsigned int> >, std::list<unsigned int, std::allocator<unsigned int> > const&>(std::list<unsigned int, std::allocator<unsigned int> >*, std::list<unsigned int, std::allocator<unsigned int> > const&)
1.40 4.66 0.14 4120 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >* std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > >, std::list<unsigned int, std::allocator<unsigned int> >*>(__gnu_cxx::__normal_iterator<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > >, __gnu_cxx::__normal_iterator<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > >, std::list<unsigned int, std::allocator<unsigned int> >*)
1.40 4.80 0.14 4120 0.00 0.00 std::vector<short, std::allocator<short> >* std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > >, std::vector<short, std::allocator<short> >*>(__gnu_cxx::__normal_iterator<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > >, __gnu_cxx::__normal_iterator<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > >, std::vector<short, std::allocator<short> >*)
1.30 4.93 0.13 9080202 0.00 0.00 std::_Vector_base<short, std::allocator<short> >::_Vector_impl::_Vector_impl(std::allocator<short> const&)
1.25 5.05 0.12 9051640 0.00 0.00 void std::_Construct<std::vector<short, std::allocator<short> >, std::vector<short, std::allocator<short> > const&>(std::vector<short, std::allocator<short> >*, std::vector<short, std::allocator<short> > const&)
1.20 5.17 0.12 9092637 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >::~list()
1.20 5.29 0.12 4123 0.00 0.00 void std::_Destroy_aux<false>::__destroy<std::vector<short, std::allocator<short> >*>(std::vector<short, std::allocator<short> >*, std::vector<short, std::allocator<short> >*)
1.15 5.41 0.12 34333589 0.00 0.00 std::_List_const_iterator<unsigned int>::operator!=(std::_List_const_iterator<unsigned int> const&) const
1.10 5.52 0.11 9084596 0.00 0.00 std::vector<short, std::allocator<short> >::~vector()
1.10 5.63 0.11 9082398 0.00 0.00 short* std::copy<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*>(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, __gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*)
1.05 5.74 0.10 9084123 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_List_impl::_List_impl(std::allocator<std::_List_node<unsigned int> > const&)
1.00 5.83 0.10 25532990 0.00 0.00 std::_List_iterator<unsigned int>::_List_iterator(std::__detail::_List_node_base*)
1.00 5.93 0.10 18164796 0.00 0.00 std::_Iter_base<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, false>::_S_base(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >)
1.00 6.04 0.10 9092637 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::~_List_base()
1.00 6.13 0.10 9080202 0.00 0.00 std::_Vector_base<short, std::allocator<short> >::_Vector_base(unsigned int, std::allocator<short> const&)
0.95 6.23 0.10 9082398 0.00 0.00 short* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*>(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, __gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*)
0.95 6.33 0.10 9082398 0.00 0.00 short* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*>(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, __gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*)
0.90 6.42 0.09 9107748 0.00 0.00 std::vector<short, std::allocator<short> >::begin() const
0.90 6.50 0.09 9086793 0.00 0.00 void std::_Destroy<short*>(short*, short*)
0.90 6.59 0.09 9085052 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >::begin() const
0.90 6.68 0.09 4123 0.00 0.00 void std::_Destroy_aux<false>::__destroy<std::list<unsigned int, std::allocator<unsigned int> >*>(std::list<unsigned int, std::allocator<unsigned int> >*, std::list<unsigned int, std::allocator<unsigned int> >*)
0.85 6.77 0.09 18164796 0.00 0.00 std::_Miter_base<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > > >(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >)
0.85 6.86 0.09 9084123 0.00 0.00 std::allocator<std::_List_node<unsigned int> >::allocator<unsigned int>(std::allocator<unsigned int> const&)
0.85 6.94 0.09 9055760 0.00 0.00 bool __gnu_cxx::operator!=<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > >(__gnu_cxx::__normal_iterator<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > > const&, __gnu_cxx::__normal_iterator<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > > const&)
0.80 7.02 0.08 18280452 0.00 0.00 __gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >::__normal_iterator(short const* const&)
0.80 7.10 0.08 18164796 0.00 0.00 std::_Iter_base<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, true>::_S_base(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >)
0.80 7.18 0.08 9092637 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_M_init()
0.80 7.26 0.08 9082398 0.00 0.00 short* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<short>(short const*, short const*, short*)
0.75 7.33 0.07 43405668 0.00 0.00 operator new(unsigned int, void*)
0.75 7.41 0.07 9085052 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >::end() const
0.75 7.49 0.07 9082398 0.00 0.00 short* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*, short>(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, __gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*, std::allocator<short>&)
0.75 7.56 0.07 9053837 0.00 0.00 void std::_Destroy<std::list<unsigned int, std::allocator<unsigned int> > >(std::list<unsigned int, std::allocator<unsigned int> >*)
0.70 7.63 0.07 25470908 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >::end()
0.70 7.70 0.07 25270362 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_M_get_node()
0.70 7.77 0.07 25249466 0.00 0.00 std::_List_const_iterator<unsigned int>::operator*() const
0.70 7.84 0.07 9134142 0.00 0.00 std::vector<short, std::allocator<short> >::end() const
0.70 7.91 0.07 9086793 0.00 0.00 void std::_Destroy<short*, short>(short*, short*, std::allocator<short>&)
0.70 7.98 0.07 9080202 0.00 0.00 std::allocator<short>::allocator(std::allocator<short> const&)
0.65 8.04 0.07 18170104 0.00 0.00 std::_List_const_iterator<unsigned int>::_List_const_iterator(std::__detail::_List_node_base const*)
0.65 8.11 0.07 18164796 0.00 0.00 std::_Niter_base<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > > >(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >)
0.65 8.18 0.07 18107674 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >* std::__addressof<std::list<unsigned int, std::allocator<unsigned int> > >(std::list<unsigned int, std::allocator<unsigned int> >&)
0.65 8.24 0.07 9051640 0.00 0.00 __gnu_cxx::__normal_iterator<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > >::operator++()
0.60 8.30 0.06 25270362 0.00 0.00 __gnu_cxx::new_allocator<std::_List_node<unsigned int> >::deallocate(std::_List_node<unsigned int>*, unsigned int)
0.60 8.36 0.06 25270362 0.00 0.00 __gnu_cxx::new_allocator<std::_List_node<unsigned int> >::destroy(std::_List_node<unsigned int>*)
0.60 8.42 0.06 18176760 0.00 0.00 __gnu_cxx::new_allocator<std::_List_node<unsigned int> >::~new_allocator()
0.60 8.48 0.06 18111520 0.00 0.00 __gnu_cxx::__normal_iterator<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > >::base() const
0.60 8.54 0.06 9084596 0.00 0.00 std::_Vector_base<short, std::allocator<short> >::_Vector_impl::~_Vector_impl()
0.60 8.60 0.06 9084596 0.00 0.00 std::_Niter_base<short*>::iterator_type std::__niter_base<short*>(short*)
0.50 8.65 0.05 25270362 0.00 0.00 std::_List_node<unsigned int>::~_List_node()
0.50 8.70 0.05 18268284 0.00 0.00 __gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >::base() const
0.45 8.74 0.04 9053837 0.00 0.00 void std::_Destroy<std::vector<short, std::allocator<short> > >(std::vector<short, std::allocator<short> >*)
0.45 8.79 0.04 24348 0.00 0.00 unsigned int&& std::forward<unsigned int>(std::remove_reference<unsigned int>::type&)
0.40 8.83 0.04 18176760 0.00 0.00 std::allocator<std::_List_node<unsigned int> >::~allocator()
0.40 8.87 0.04 18107674 0.00 0.00 std::vector<short, std::allocator<short> >* std::__addressof<std::vector<short, std::allocator<short> > >(std::vector<short, std::allocator<short> >&)
0.40 8.91 0.04 9086793 0.00 0.00 void std::_Destroy_aux<true>::__destroy<short*>(short*, short*)
0.40 8.95 0.04 9084596 0.00 0.00 std::_Vector_base<short, std::allocator<short> >::_M_allocate(unsigned int)
0.40 8.99 0.04 9082398 0.00 0.00 short* std::__copy_move_a<false, short const*, short*>(short const*, short const*, short*)
0.40 9.03 0.04 9082398 0.00 0.00 short* std::uninitialized_copy<__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*>(__gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, __gnu_cxx::__normal_iterator<short const*, std::vector<short, std::allocator<short> > >, short*)
0.40 9.07 0.04 9051640 0.00 0.00 __gnu_cxx::__normal_iterator<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > >::operator*() const
0.35 9.11 0.04 18171389 0.00 0.00 std::_Vector_base<short, std::allocator<short> >::_M_get_Tp_allocator()
0.35 9.14 0.04 9084123 0.00 0.00 std::allocator<std::_List_node<unsigned int> >::allocator(std::allocator<std::_List_node<unsigned int> > const&)
0.35 9.18 0.04 9051640 0.00 0.00 __gnu_cxx::__normal_iterator<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > >::operator++()
0.35 9.21 0.04 9051640 0.00 0.00 std::vector<short, std::allocator<short> > const& std::forward<std::vector<short, std::allocator<short> > const&>(std::remove_reference<std::vector<short, std::allocator<short> > const&>::type&)
0.35 9.24 0.04 1 0.04 0.04 redelemeier_with_pruning::full_convex_counter_3d(int, int, unsigned long long, unsigned long long, std::vector<unsigned long long, std::allocator<unsigned long long> >*, std::basic_ofstream<char, std::char_traits<char> >*)
0.30 9.28 0.03 9088244 0.00 0.00 std::allocator<unsigned int>::~allocator()
0.30 9.30 0.03 9086793 0.00 0.00 std::_Vector_base<short, std::allocator<short> >::_M_deallocate(short*, unsigned int)
0.25 9.33 0.03 25270362 0.00 0.00 __gnu_cxx::new_allocator<std::_List_node<unsigned int> >::max_size() const
0.25 9.36 0.03 9084596 0.00 0.00 __gnu_cxx::new_allocator<short>::allocate(unsigned int, void const*)
0.25 9.38 0.03 9084123 0.00 0.00 std::allocator<unsigned int>::allocator<std::_List_node<unsigned int> >(std::allocator<std::_List_node<unsigned int> > const&)
0.25 9.40 0.03 9082855 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_M_get_Node_allocator() const
0.20 9.43 0.02 18111520 0.00 0.00 __gnu_cxx::__normal_iterator<std::list<unsigned int, std::allocator<unsigned int> > const*, std::vector<std::list<unsigned int, std::allocator<unsigned int> >, std::allocator<std::list<unsigned int, std::allocator<unsigned int> > > > >::base() const
0.20 9.45 0.02 9092637 0.00 0.00 __gnu_cxx::new_allocator<std::_List_node<unsigned int> >::new_allocator()
0.20 9.46 0.02 9092637 0.00 0.00 std::_List_base<unsigned int, std::allocator<unsigned int> >::_List_impl::~_List_impl()
0.20 9.48 0.02 9088244 0.00 0.00 __gnu_cxx::new_allocator<unsigned int>::~new_allocator()
0.20 9.51 0.02 9084596 0.00 0.00 __gnu_cxx::new_allocator<short>::deallocate(short*, unsigned int)
0.20 9.53 0.02 9080201 0.00 0.00 std::_Vector_base<short, std::allocator<short> >::_M_get_Tp_allocator() const
0.20 9.54 0.02 9051640 0.00 0.00 __gnu_cxx::__normal_iterator<std::vector<short, std::allocator<short> > const*, std::vector<std::vector<short, std::allocator<short> >, std::allocator<std::vector<short, std::allocator<short> > > > >::operator*() const
0.20 9.56 0.02 495658 0.00 0.00 __gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > >::__normal_iterator(unsigned int* const&)
0.20 9.59 0.02 34461 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> >::begin()
0.20 9.61 0.02 18637 0.00 0.00 std::_List_iterator<unsigned int>::operator==(std::_List_iterator<unsigned int> const&) const
0.20 9.62 0.02 6087 0.00 0.00 std::_List_node<unsigned int>* std::list<unsigned int, std::allocator<unsigned int> >::_M_create_node<unsigned int>(unsigned int&&)
0.20 9.64 0.02 6087 0.00 0.00 std::remove_reference<unsigned int&>::type&& std::move<unsigned int&>(unsigned int&)
0.15 9.66 0.01 9084597 0.00 0.00 __gnu_cxx::new_allocator<short>::~new_allocator()
0.15 9.68 0.01 9084123 0.00 0.00 __gnu_cxx::new_allocator<std::_List_node<unsigned int> >::new_allocator(__gnu_cxx::new_allocator<std::_List_node<unsigned int> > const&)
0.15 9.69 0.01 31742 0.00 0.00 std::vector<unsigned int, std::allocator<unsigned int> >::begin()
0.15 9.71 0.01 6087 0.00 0.00 void std::list<unsigned int, std::allocator<unsigned int> >::_M_insert<unsigned int>(std::_List_iterator<unsigned int>, unsigned int&&)
0.15 9.72 0.01 4395 0.00 0.00 __gnu_cxx::new_allocator<short>::new_allocator()
0.15 9.73 0.01 2 0.01 0.01 __gnu_cxx::new_allocator<std::vector<short, std::allocator<short> > >::new_allocator()
0.15 9.75 0.01 __gnu_cxx::__normal_iterator<short*, std::vector<short, std::allocator<short> > >::base() const
0.15 9.77 0.01 std::_Niter_base<__gnu_cxx::__normal_iterator<short*, std::vector<short, std::allocator<short> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<short*, std::vector<short, std::allocator<short> > > >(__gnu_cxx::__normal_iterator<short*, std::vector<short, std::allocator<short> > >)
0.10 9.78 0.01 9084596 0.00 0.00 __gnu_cxx::new_allocator<short>::max_size() const
0.10 9.79 0.01 9084596 0.00 0.00 std::_Iter_base<short*, false>::_S_base(short*)
0.10 9.79 0.01 9084124 0.00 0.00 __gnu_cxx::new_allocator<unsigned int>::new_allocator()
0.10 9.80 0.01 216086 0.00 0.00 bool __gnu_cxx::operator!=<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > >(__gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > > const&, __gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > > const&)
0.10 9.81 0.01 206056 0.00 0.00 std::_Bit_const_iterator::operator*() const
0.10 9.82 0.01 173040 0.00 0.00 std::_Bit_iterator_base::_M_bump_up()
0.10 9.84 0.01 8243 0.00 0.00 __gnu_cxx::new_allocator<unsigned long>::~new_allocator()
0.10 9.85 0.01 8242 0.00 0.00 unsigned int* std::__uninitialized_move_a<unsigned int*, unsigned int*, std::allocator<unsigned int> >(unsigned int*, unsigned int*, unsigned int*, std::allocator<unsigned int>&)
0.10 9.86 0.01 8242 0.00 0.00 void std::_Destroy<unsigned int*, unsigned int>(unsigned int*, unsigned int*, std::allocator<unsigned int>&)
0.10 9.87 0.01 8240 0.00 0.00 std::_Miter_base<unsigned long*>::iterator_type std::__miter_base<unsigned long*>(unsigned long*)
0.10 9.88 0.01 4121 0.00 0.00 std::vector<unsigned int, std::allocator<unsigned int> >::_M_check_len(unsigned int, char const*) const
0.05 9.88 0.01 9084597 0.00 0.00 std::allocator<short>::~allocator()
0.05 9.88 0.01 9080202 0.00 0.00 __gnu_cxx::new_allocator<short>::new_allocator(__gnu_cxx::new_allocator<short> const&)
0.05 9.89 0.01 9051640 0.00 0.00 std::list<unsigned int, std::allocator<unsigned int> > const& std::forward<std::list<unsigned int, std::allocator<unsigned int> > const&>(std::remove_reference<std::list<unsigned int, std::allocator<unsigned int> > const&>::type&)
0.05 9.89 0.01 186330 0.00 0.00 std::vector<unsigned int, std::allocator<unsigned int> >::size() const

sed to replace a character from a column with space but retianing the format

I have a file a.pdb as
ATOM 3201 CD2 LEU A 337 7.734 18.538 6.979 0.00 0.00 0.000 C
ATOM 3202 C LEU A 337 5.169 14.358 7.663 0.00 0.00 0.206 C
ATOM 3203 O LEU A 337 4.123 14.537 8.395 0.00 0.00 -0.646 OA
ATOM 3204 OXT LEU A 337 5.124 13.563 6.672 0.00 0.00 -0.646 OA
HETATM 3206 CA CA A 338 18.241 31.994 15.308 0.00 0.00 0.000 CA
HETATM 3207 CA CA A 339 16.703 30.240 22.272 0.00 0.00 0.000 CA
Desired output:
ATOM 3201 CD2 LEU 337 7.735 18.538 6.979 0.00 0.00 0.000 C
ATOM 3202 C LEU 337 5.169 14.358 7.663 0.00 0.00 0.206 C
ATOM 3203 O LEU 337 4.122 14.537 8.395 0.00 0.00 -0.646 OA
ATOM 3204 OXT LEU 337 5.124 13.562 6.671 0.00 0.00 -0.646 OA
HETATM 3206 CA CA 338 18.242 31.994 15.307 0.00 0.00 0.000 CA
HETATM 3207 CA CA 339 16.703 30.240 22.272 0.00 0.00 0.000 CA
How can I replace the letter "A" in the 22nd column or 5th block with space but retaining the format of a.pdb?
sed -r 's/^(.{21})A/\1 /' a.pdb
(.{21}) matches 21 characters and put them in capture group 1, which is then used in the substitution.
Using awk:
awk '$0=substr($0,1,21) FS substr($0,23)' file
$ awk '$0=substr($0,1,21) FS substr($0,23)' file
ATOM 3201 CD2 LEU 337 7.734 18.538 6.979 0.00 0.00 0.000 C
ATOM 3202 C LEU 337 5.169 14.358 7.663 0.00 0.00 0.206 C
ATOM 3203 O LEU 337 4.123 14.537 8.395 0.00 0.00 -0.646 OA
ATOM 3204 OXT LEU 337 5.124 13.563 6.672 0.00 0.00 -0.646 OA
HETATM 3206 CA CA 338 18.241 31.994 15.308 0.00 0.00 0.000 CA
HETATM 3207 CA CA 339 16.703 30.240 22.272 0.00 0.00 0.000 CA
You can use gsub in awk to do this...
awk '{gsub("A"," ",$5); print}' a.pdb