Merge DATASET1 and DATASET2 files - sas

dataset1
Fuel NOx
Ethanol 3.741
Ethanol 2.295
Ethanol 1.498
Ethanol 2.881
Ethanol 0.76
Ethanol 3.12
Ethanol 0.638
Ethanol 1.17
Ethanol 2.358
Ethanol 0.606
Ethanol 3.669
Ethanol 1
Ethanol 0.981
Ethanol 1.192
Ethanol 0.926
Ethanol 1.59
Ethanol 1.806
Ethanol 1.962
Ethanol 4.028
Ethanol 3.148
Ethanol 1.836
Ethanol 2.845
Ethanol 1.013
Ethanol 0.414
Ethanol 0.812
Ethanol 0.374
Ethanol 3.623
Ethanol 1.869
Ethanol 2.836
Ethanol 3.567
Ethanol 0.866
Ethanol 1.369
Ethanol 0.542
Ethanol 2.739
Ethanol 1.2
Ethanol 1.719
Ethanol 3.423
Ethanol 1.634
Ethanol 1.021
Ethanol 2.157
Ethanol 3.361
Ethanol 1.39
Ethanol 1.947
Ethanol 0.962
Ethanol 0.571
Ethanol 2.219
Ethanol 1.419
Ethanol 3.519
Ethanol 1.732
Ethanol 3.206
Ethanol 2.471
Ethanol 1.777
Ethanol 2.571
Ethanol 3.952
Ethanol 3.931
Ethanol 1.587
Ethanol 1.397
Ethanol 3.536
Ethanol 2.202
Ethanol 0.756
Indolene 4.818
Indolene 2.849
Indolene 3.275
Indolene 4.691
Indolene 4.255
Indolene 5.064
Indolene 2.118
Indolene 4.602
Indolene 2.286
Indolene 0.97
Indolene 3.965
Indolene 5.344
Indolene 3.834
Ethanol 1.62
Ethanol 3.656
Ethanol 2.964
82rongas 6.021
82rongas 4.467
82rongas 3.046
82rongas 1.596
82rongas 0.835
82rongas 5.498
82rongas 5.47
82rongas 4.084
82rongas 0.716
94%Eth 2.382
94%Eth 1.004
94%Eth 0.623
94%Eth 1.03
94%Eth 2.593
94%Eth 2.699
94%Eth 3.177
94%Eth 1.151
94%Eth 0.474
94%Eth 2.814
94%Eth 3.308
94%Eth 3.031
94%Eth 2.537
94%Eth 2.403
94%Eth 2.412
dataset2
Fuel FuelCD
Gasohol F4
Methanol F2
Ethanol F5
94%Eth F6
82rongas F1
Indolene F3
Fuel FuelCD NOx
Gasohol F4
Methanol F2
Ethanol F5
94%Eth F6
82rongas F1
Indolene F3
Ethanol 3.741
Ethanol 2.295
Ethanol 1.498
Ethanol 2.881
Ethanol 0.76
Ethanol 3.12
Ethanol 0.638
Ethanol 1.17
Ethanol 2.358
Ethanol 0.606
Ethanol 3.669
Ethanol 1
Ethanol 0.981
Ethanol 1.192
Ethanol 0.926
Ethanol 1.59
Ethanol 1.806
Ethanol 1.962
Ethanol 4.028
Ethanol 3.148
Ethanol 1.836
Ethanol 2.845
Ethanol 1.013
Ethanol 0.414
Ethanol 0.812
Ethanol 0.374
Ethanol 3.623
Ethanol 1.869
Ethanol 2.836
Ethanol 3.567
Ethanol 0.866
Ethanol 1.369
Ethanol 0.542
Ethanol 2.739
Ethanol 1.2
Ethanol 1.719
Ethanol 3.423
Ethanol 1.634
Ethanol 1.021
Ethanol 2.157
Ethanol 3.361
Ethanol 1.39
Ethanol 1.947
Ethanol 0.962
Ethanol 0.571
Ethanol 2.219
Ethanol 1.419
Ethanol 3.519
Ethanol 1.732
Ethanol 3.206
Ethanol 2.471
Ethanol 1.777
Ethanol 2.571
Ethanol 3.952
Ethanol 3.931
Ethanol 1.587
Ethanol 1.397
Ethanol 3.536
Ethanol 2.202
Ethanol 0.756
Indolene 4.818
Indolene 2.849
Indolene 3.275
Indolene 4.691
Indolene 4.255
Indolene 5.064
Indolene 2.118
Indolene 4.602
Indolene 2.286
Indolene 0.97
Indolene 3.965
Indolene 5.344
Indolene 3.834
Ethanol 1.62
Ethanol 3.656
Ethanol 2.964
82rongas 6.021
82rongas 4.467
82rongas 3.046
82rongas 1.596
82rongas 0.835
82rongas 5.498
82rongas 5.47
82rongas 4.084
82rongas 0.716
94%Eth 2.382
94%Eth 1.004
94%Eth 0.623
94%Eth 1.03
94%Eth 2.593
94%Eth 2.699
94%Eth 3.177
94%Eth 1.151
94%Eth 0.474
94%Eth 2.814
94%Eth 3.308
94%Eth 3.031
94%Eth 2.537
94%Eth 2.403
94%Eth 2.412

The answer is right in the question. Just MERGE the two datasets. Make sure they are both sorted by the BY variable first.
data want;
merge dataset1 dataset2 ;
by fuel;
run;

You can sort the data sets and do MERGE one two; BY fuel;
Example:
proc sort data=one;
by fuel;
proc sort data=two;
by fuel;
data want;
merge one two;
by fuel;
run;
Alternatively, you can use SQL to merge data sets.
proc sql;
create table want as
select
coalesce(one.fuel, two.fuel) as fuel
, one.NOx, two.fuelcd
from one
full join two
on one.fuel = two.fuel
;

Related

How to understand the output of the -ftime-report flag of gcc?

I profiled the compilation of my code with g++ -ftime-report to try to find a way to speed it up.
Here is the output :
Time variable usr sys wall GGC
phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1353 kB ( 0%)
phase parsing : 2.06 ( 5%) 1.13 ( 50%) 3.30 ( 8%) 565836 kB ( 30%)
phase lang. deferred : 0.30 ( 1%) 0.06 ( 3%) 0.36 ( 1%) 65727 kB ( 4%)
phase opt and generate : 37.96 ( 94%) 1.07 ( 47%) 39.03 ( 91%) 1224911 kB ( 66%)
|name lookup : 0.23 ( 1%) 0.06 ( 3%) 0.34 ( 1%) 18602 kB ( 1%)
|overload resolution : 0.36 ( 1%) 0.10 ( 4%) 0.41 ( 1%) 83103 kB ( 4%)
garbage collection : 0.42 ( 1%) 0.00 ( 0%) 0.43 ( 1%) 0 kB ( 0%)
dump files : 0.02 ( 0%) 0.01 ( 0%) 0.06 ( 0%) 0 kB ( 0%)
callgraph construction : 0.18 ( 0%) 0.01 ( 0%) 0.17 ( 0%) 12930 kB ( 1%)
callgraph optimization : 0.10 ( 0%) 0.01 ( 0%) 0.05 ( 0%) 371 kB ( 0%)
ipa function summary : 0.07 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 1110 kB ( 0%)
ipa dead code removal : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
ipa devirtualization : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 134 kB ( 0%)
ipa cp : 0.10 ( 0%) 0.01 ( 0%) 0.12 ( 0%) 8595 kB ( 0%)
ipa inlining heuristics : 3.18 ( 8%) 0.00 ( 0%) 3.20 ( 7%) 19108 kB ( 1%)
ipa function splitting : 0.21 ( 1%) 0.00 ( 0%) 0.19 ( 0%) 286 kB ( 0%)
ipa reference : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
ipa pure const : 0.05 ( 0%) 0.01 ( 0%) 0.06 ( 0%) 17 kB ( 0%)
ipa icf : 0.12 ( 0%) 0.00 ( 0%) 0.12 ( 0%) 1 kB ( 0%)
ipa SRA : 0.33 ( 1%) 0.03 ( 1%) 0.27 ( 1%) 23892 kB ( 1%)
ipa free inline summary : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
cfg construction : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 2185 kB ( 0%)
cfg cleanup : 0.63 ( 2%) 0.02 ( 1%) 0.65 ( 2%) 4734 kB ( 0%)
trivially dead code : 0.16 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 0 kB ( 0%)
df scan insns : 0.24 ( 1%) 0.01 ( 0%) 0.22 ( 1%) 8 kB ( 0%)
df multiple defs : 0.21 ( 1%) 0.01 ( 0%) 0.23 ( 1%) 0 kB ( 0%)
df reaching defs : 0.75 ( 2%) 0.00 ( 0%) 0.74 ( 2%) 0 kB ( 0%)
df live regs : 2.00 ( 5%) 0.00 ( 0%) 2.05 ( 5%) 0 kB ( 0%)
df live&initialized regs : 0.69 ( 2%) 0.00 ( 0%) 0.76 ( 2%) 0 kB ( 0%)
df must-initialized regs : 0.61 ( 2%) 0.24 ( 11%) 0.83 ( 2%) 0 kB ( 0%)
df use-def / def-use chains : 0.25 ( 1%) 0.00 ( 0%) 0.26 ( 1%) 0 kB ( 0%)
df reg dead/unused notes : 0.87 ( 2%) 0.00 ( 0%) 0.79 ( 2%) 14516 kB ( 1%)
register information : 0.10 ( 0%) 0.00 ( 0%) 0.15 ( 0%) 0 kB ( 0%)
alias analysis : 0.40 ( 1%) 0.00 ( 0%) 0.34 ( 1%) 28831 kB ( 2%)
alias stmt walking : 0.72 ( 2%) 0.07 ( 3%) 0.64 ( 1%) 5194 kB ( 0%)
register scan : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 217 kB ( 0%)
rebuild jump labels : 0.08 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 0 kB ( 0%)
preprocessing : 0.60 ( 1%) 0.59 ( 26%) 1.38 ( 3%) 194467 kB ( 10%)
parser (global) : 0.22 ( 1%) 0.23 ( 10%) 0.38 ( 1%) 102668 kB ( 6%)
parser struct body : 0.27 ( 1%) 0.06 ( 3%) 0.35 ( 1%) 62614 kB ( 3%)
parser function body : 0.35 ( 1%) 0.09 ( 4%) 0.38 ( 1%) 70207 kB ( 4%)
parser inl. func. body : 0.06 ( 0%) 0.04 ( 2%) 0.07 ( 0%) 7795 kB ( 0%)
parser inl. meth. body : 0.16 ( 0%) 0.04 ( 2%) 0.22 ( 1%) 32985 kB ( 2%)
template instantiation : 0.64 ( 2%) 0.14 ( 6%) 0.78 ( 2%) 160006 kB ( 9%)
constant expression evaluation : 0.01 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 348 kB ( 0%)
early inlining heuristics : 0.12 ( 0%) 0.01 ( 0%) 0.09 ( 0%) 50683 kB ( 3%)
inline parameters : 0.16 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 9128 kB ( 0%)
integration : 1.01 ( 3%) 0.13 ( 6%) 1.20 ( 3%) 272019 kB ( 15%)
tree gimplify : 0.09 ( 0%) 0.02 ( 1%) 0.10 ( 0%) 43912 kB ( 2%)
tree eh : 0.15 ( 0%) 0.00 ( 0%) 0.17 ( 0%) 49453 kB ( 3%)
tree CFG construction : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 24163 kB ( 1%)
tree CFG cleanup : 0.77 ( 2%) 0.02 ( 1%) 0.86 ( 2%) 570 kB ( 0%)
tree tail merge : 0.10 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 1409 kB ( 0%)
tree VRP : 0.68 ( 2%) 0.00 ( 0%) 0.76 ( 2%) 30167 kB ( 2%)
tree Early VRP : 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 4515 kB ( 0%)
tree copy propagation : 0.19 ( 0%) 0.00 ( 0%) 0.20 ( 0%) 286 kB ( 0%)
tree PTA : 0.65 ( 2%) 0.00 ( 0%) 0.69 ( 2%) 5326 kB ( 0%)
tree PHI insertion : 0.01 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 5166 kB ( 0%)
tree SSA rewrite : 0.29 ( 1%) 0.02 ( 1%) 0.27 ( 1%) 28108 kB ( 2%)
tree SSA other : 0.04 ( 0%) 0.01 ( 0%) 0.05 ( 0%) 357 kB ( 0%)
tree SSA incremental : 0.38 ( 1%) 0.02 ( 1%) 0.39 ( 1%) 13003 kB ( 1%)
tree operand scan : 0.27 ( 1%) 0.05 ( 2%) 0.21 ( 0%) 41554 kB ( 2%)
dominator optimization : 0.62 ( 2%) 0.03 ( 1%) 0.70 ( 2%) 26865 kB ( 1%)
backwards jump threading : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1082 kB ( 0%)
tree SRA : 0.07 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 757 kB ( 0%)
isolate eroneous paths : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
tree CCP : 0.38 ( 1%) 0.01 ( 0%) 0.38 ( 1%) 6460 kB ( 0%)
tree split crit edges : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 6860 kB ( 0%)
tree reassociation : 0.07 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 95 kB ( 0%)
tree PRE : 0.49 ( 1%) 0.07 ( 3%) 0.59 ( 1%) 29233 kB ( 2%)
tree FRE : 0.31 ( 1%) 0.06 ( 3%) 0.37 ( 1%) 5463 kB ( 0%)
tree code sinking : 0.05 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 1998 kB ( 0%)
tree linearize phis : 0.06 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 235 kB ( 0%)
tree backward propagate : 0.02 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
tree forward propagate : 0.15 ( 0%) 0.00 ( 0%) 0.19 ( 0%) 3598 kB ( 0%)
tree phiprop : 0.00 ( 0%) 0.01 ( 0%) 0.01 ( 0%) 162 kB ( 0%)
tree conservative DCE : 0.17 ( 0%) 0.04 ( 2%) 0.18 ( 0%) 121 kB ( 0%)
tree aggressive DCE : 0.09 ( 0%) 0.00 ( 0%) 0.18 ( 0%) 3761 kB ( 0%)
tree buildin call DCE : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 38 kB ( 0%)
tree DSE : 0.10 ( 0%) 0.01 ( 0%) 0.13 ( 0%) 2485 kB ( 0%)
PHI merge : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 202 kB ( 0%)
complete unrolling : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 242 kB ( 0%)
tree slp vectorization : 0.14 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 40876 kB ( 2%)
tree iv optimization : 0.02 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 484 kB ( 0%)
tree SSA uncprop : 0.02 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
tree switch conversion : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
gimple CSE sin/cos : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 3 kB ( 0%)
gimple widening/fma detection : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%)
tree strlen optimization : 0.05 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 192 kB ( 0%)
dominance frontiers : 0.06 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 0 kB ( 0%)
dominance computation : 0.70 ( 2%) 0.00 ( 0%) 0.78 ( 2%) 0 kB ( 0%)
control dependences : 0.03 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%)
out of ssa : 0.09 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 1047 kB ( 0%)
expand vars : 0.61 ( 2%) 0.00 ( 0%) 0.59 ( 1%) 11361 kB ( 1%)
expand : 0.24 ( 1%) 0.02 ( 1%) 0.27 ( 1%) 110705 kB ( 6%)
post expand cleanups : 0.11 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 11138 kB ( 1%)
varconst : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 9 kB ( 0%)
forward prop : 0.28 ( 1%) 0.00 ( 0%) 0.32 ( 1%) 7463 kB ( 0%)
CSE : 0.47 ( 1%) 0.01 ( 0%) 0.50 ( 1%) 6406 kB ( 0%)
dead code elimination : 0.15 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 0 kB ( 0%)
dead store elim1 : 0.25 ( 1%) 0.00 ( 0%) 0.25 ( 1%) 7807 kB ( 0%)
dead store elim2 : 0.15 ( 0%) 0.01 ( 0%) 0.13 ( 0%) 12268 kB ( 1%)
loop init : 0.30 ( 1%) 0.00 ( 0%) 0.32 ( 1%) 3678 kB ( 0%)
loop invariant motion : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 1 kB ( 0%)
loop fini : 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
CPROP : 0.49 ( 1%) 0.00 ( 0%) 0.49 ( 1%) 12212 kB ( 1%)
PRE : 3.67 ( 9%) 0.04 ( 2%) 3.69 ( 9%) 17514 kB ( 1%)
CSE 2 : 0.28 ( 1%) 0.01 ( 0%) 0.29 ( 1%) 2791 kB ( 0%)
branch prediction : 0.13 ( 0%) 0.00 ( 0%) 0.13 ( 0%) 823 kB ( 0%)
combiner : 0.48 ( 1%) 0.00 ( 0%) 0.51 ( 1%) 19027 kB ( 1%)
if-conversion : 0.08 ( 0%) 0.00 ( 0%) 0.07 ( 0%) 3838 kB ( 0%)
integrated RA : 1.09 ( 3%) 0.00 ( 0%) 1.14 ( 3%) 72103 kB ( 4%)
LRA non-specific : 0.52 ( 1%) 0.01 ( 0%) 0.47 ( 1%) 3373 kB ( 0%)
LRA virtuals elimination : 0.14 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 9546 kB ( 1%)
LRA reload inheritance : 0.04 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 822 kB ( 0%)
LRA create live ranges : 0.30 ( 1%) 0.00 ( 0%) 0.39 ( 1%) 1330 kB ( 0%)
LRA hard reg assignment : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
LRA rematerialization : 0.10 ( 0%) 0.00 ( 0%) 0.05 ( 0%) 0 kB ( 0%)
reload CSE regs : 0.49 ( 1%) 0.00 ( 0%) 0.53 ( 1%) 15987 kB ( 1%)
load CSE after reload : 2.73 ( 7%) 0.00 ( 0%) 2.72 ( 6%) 11924 kB ( 1%)
ree : 0.07 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 74 kB ( 0%)
thread pro- & epilogue : 0.18 ( 0%) 0.00 ( 0%) 0.22 ( 1%) 446 kB ( 0%)
if-conversion 2 : 0.08 ( 0%) 0.00 ( 0%) 0.06 ( 0%) 23 kB ( 0%)
split paths : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 16 kB ( 0%)
combine stack adjustments : 0.01 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%)
peephole 2 : 0.09 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 878 kB ( 0%)
hard reg cprop : 0.19 ( 0%) 0.00 ( 0%) 0.15 ( 0%) 11 kB ( 0%)
scheduling 2 : 1.08 ( 3%) 0.02 ( 1%) 1.13 ( 3%) 8034 kB ( 0%)
machine dep reorg : 0.09 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 2338 kB ( 0%)
reorder blocks : 0.33 ( 1%) 0.00 ( 0%) 0.31 ( 1%) 4597 kB ( 0%)
shorten branches : 0.09 ( 0%) 0.00 ( 0%) 0.10 ( 0%) 0 kB ( 0%)
final : 0.19 ( 0%) 0.00 ( 0%) 0.20 ( 0%) 21653 kB ( 1%)
tree if-combine : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 386 kB ( 0%)
straight-line strength reduction : 0.04 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 26 kB ( 0%)
store merging : 0.02 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 868 kB ( 0%)
initialize rtl : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 12 kB ( 0%)
address lowering : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
early local passes : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%)
rest of compilation : 0.59 ( 1%) 0.00 ( 0%) 0.55 ( 1%) 7917 kB ( 0%)
remove unused locals : 0.17 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 416 kB ( 0%)
address taken : 0.11 ( 0%) 0.01 ( 0%) 0.14 ( 0%) 0 kB ( 0%)
rebuild frequencies : 0.05 ( 0%) 0.00 ( 0%) 0.04 ( 0%) 138 kB ( 0%)
repair loop structures : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%)
TOTAL : 40.32 2.26 42.69 1857845 kB
My problem is that I don't understand a thing about all the terms in this report (ipa cp, tree eh,...). I would like at least to understand what is the phase opt and generate stage because it takes 94% of the compile time so it's definitely what I should tackle.
In gcc documentation, there's almost no information about this command https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#Developer-Options
-ftime-report
Makes the compiler print some statistics about the time consumed by each pass when it finishes.
It's a bit surprising to have a description that non exhaustive for a so complicated command.

How to sort a 2d array in cpp by values of two columns?

I have a std::vector< std::vector<double> > array, the entries of which are
1 80 -0.15 -0.9 -0.15 0.6 0 -1.5
1 81 -0.15 -0.9 -0.15 0.7 0 -1.6
1 82 -0.15 -0.9 -0.15 0.8 0 -1.7
1 83 -0.15 -0.9 -0.15 0.9 0 -1.8
.
.
.
79 155 0.15 0.9 0.15 -0.9 0 1.8
79 156 0.15 0.9 0.15 -0.8 0 1.7
79 157 0.15 0.9 0.15 -0.7 0 1.6
79 158 0.15 0.9 0.15 -0.6 0 1.5
Each row has 8 elements. I want to sort the array by the 7th and 8th element using the std::sort function as
auto sortfunc = [](vector<double> va, vector<double> vb){ return (va[7] < vb[7] ) && (va[6]< vb[6] ); };
sort(array.begin(),array.end(), sortfunc );
The result is not a completely sorted array
3 153 -0.15 -0.7 0.1 -0.1 -0.25 -0.6
2 154 -0.15 -0.8 0.1 0 -0.25 -0.8
2 153 -0.15 -0.8 0.1 -0.1 -0.25 -0.7
2 152 -0.15 -0.8 0.1 -0.2 -0.25 -0.6
7 153 -0.1 -0.7 0.1 -0.1 -0.2 -0.6
7 154 -0.1 -0.7 0.1 0 -0.2 -0.7
.
.
.
74 94 0.1 0.8 -0.05 -0.5 0.15 1.3
74 95 0.1 0.8 -0.05 -0.4 0.15 1.2
74 96 0.1 0.8 -0.05 -0.3 0.15 1.1
74 97 0.1 0.8 -0.05 -0.2 0.15 1
77 100 0.15 0.7 -0.05 0.1 0.2 0.6
77 99 0.15 0.7 -0.05 0 0.2 0.7
This doesn't give me an array that is sorted by the given condition as the elements in 7th and 8th column doesn't appear in a particular order.
What am I doing wrong here?
Github Gist for the arrays is here
Your sort criteria looks off. I think you need something more like this:
auto sortfunc = [](std::vector<double> const& va, std::vector<double> const& vb)
{
if(va[7] == vb[7])
return va[6] < vb[6];
return va[7] < vb[7];
};
Sort by the first column unless the first column is equal in which case sort according to the second column.
sortfunc does not implement the requirements of Compare as it does not have a strict weak ordering. It therefore causes undefined behaviour when used with std::sort.
If you want to compare multiple values the easiest way is to use std::tuple which automatically compares the first value then only compares the second value if the first matches:
auto sortfunc = [](vector<double> va, vector<double> vb){ return std::tie(va[7], va[8]) < std::tie(vb[7], vb[8]); };

Finding the Mode in a Vector of Floats

I am trying to find the mode average in a vector containing 324 float values.
The code I have is as follows:
float max = vec.back();
float prev = max;
float mode = 0.0;
int maxcount = 0;
int currcount = 0;
for (const auto n : vec) {
if (n == prev) {
++currcount;
if (currcount > maxcount) {
maxcount = currcount;
mode = n;
}
} else {
currcount = 1;
}
prev = n;
}
std::cout << mode << std::endl
This prints out the mode to be 0.75, which is wrong.
Here are all the float values, they come from a txt file so please excuse the format:
0.61 0.61 0.61 0.62 0.62 0.62 0.62 0.62 0.62 0.62 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.74 0.74 0.74 0.74 0.74 0.74 0.74 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.77 0.77 0.77 0.77 0.77 0.77 0.77 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79
Excel presents the mode as 0.65. Why does my code not produce the same result? What do I need to change?
Many thanks.
edit: I have found through debugging the values within vec are more like; 0.68000000000000005, 0.69999999999999996, though some are still only two decimal points (0.64, 0.74 etc). Could this be the issue? Am I able to round up the values for this particular calc?
The problem might be the use of floats for comparison. Because of how they are stored, floating point numbers differ, in general, from the value they are initialized to by a small amount.
Instead of using n == prev, consider a comparison within some small epsilon that is greater than the machine precision (for any machine you expect to run this code on) but less than the smallest true difference between any of your two numbers (which looks like 0.01). So you could do
if (((n - prev) < EPSILON) && ((prev - n) < EPSILON)) { ...`
for float EPSILON = 0.000001, or a value that makes sense for you.
See also this question on comparing floats. Of note is that the ideal epsilon would change if your data set changed to much larger or much smaller numbers.
Even if there is another problem in your code, you might consider moving away from comparing floats in general.
By debugging I found that my values were not just two decimal place values, therefore, the mean average was actually 0.7500000000004, but was still being printed as 0.75.
By adding a rounding function call, and removing the const I was able to find the mean to two decimal places.
for (auto n : vec)
{
n = roundf(n * 100) / 100;
if (n == prev)
{
++currcount;
if (currcount > maxcount)
{
maxcount = currcount;
mode = n;
}
} else
{
currcount = 1;
}
prev = n;
}

Code slower despite what gprof says

I've been given a c++ code to optimize, and the first step is to introduce parallelism with OpenMP. I was able to identify several functions that badly needed optimization, so I focused on them.
The problem is that the execution time has been multiplied by about 2, when the profiling files seems to tell me that it should be much faster ..
Here are the gprof profile I get without using OpenMP :
38.07 5.55 5.55 __tcf_0
20.99 8.61 3.06 86196302 0.04 0.04 is_neighbor(int, int, int, int, double)
13.24 10.54 1.93 425940 4.53 4.53 Ellips::data_fiting(double*, int, int, double) const
9.05 11.86 1.32 _fu51___ZSt4cout
5.90 12.72 0.86 5645243 0.15 0.15 Ellips::Ellips()
3.70 13.26 0.54 4013067 0.13 0.13 intersect(Ellips&, Ellips&)
2.40 13.61 0.35 dgemv_
1.44 13.82 0.21 ddot_
1.23 14.00 0.18 141257881 0.00 0.00 Configuration::get_position(int)
1.03 14.15 0.15 __tcf_0
0.82 14.27 0.12 594893 0.20 0.20 Ellips::Ellips(double, double, int, int)
0.41 14.33 0.06 7099 8.45 400.75 Configuration::Configuration(double, double, int, int, int, int, double*, double)
0.34 14.38 0.05 3203279 0.02 0.02 Ellips::operator=(Ellips const&)
0.34 14.43 0.05 ceil
0.21 14.46 0.03 dnrm2_
0.14 14.48 0.02 _fu32___ZSt4cout
0.14 14.50 0.02 dcopy_
0.14 14.52 0.02 dscal_
0.07 14.53 0.01 7775127 0.00 0.00 Configuration::get_Ellips(int)
0.07 14.54 0.01 6239588 0.00 0.00 Ellips::~Ellips()
0.07 14.55 0.01 4349523 0.00 0.00 Configuration::get_data_fit(int)
0.07 14.56 0.01 7097 1.41 1.41 Graph<float, float, float>::maxflow(bool, Block<int>*)
0.07 14.57 0.01 _fu53___ZNSs4_Rep20_S_empty_rep_storageE
0.07 14.58 0.01 floor
0.00 14.58 0.00 432232036 0.00 0.00 Configuration::save_config(std::string)
0.00 14.58 0.00 1180034 0.00 0.00 Ellips::data_fiting(double, double*, double*, double, int, int, double) const
0.00 14.58 0.00 1173980 0.00 0.00 Ellips::get_cx() const
0.00 14.58 0.00 1164513 0.00 0.02 Configuration::add_Ellips(Ellips const&, int, double)
0.00 14.58 0.00 1157360 0.00 0.00 Ellips::get_cy() const
0.00 14.58 0.00 425940 0.00 0.00 shift_cost_exp1(double, double)
0.00 14.58 0.00 23625 0.00 0.00 Graph<float, float, float>::augment(Graph<float, float, float>::arc*)
0.00 14.58 0.00 22504 0.00 0.00 Graph<float, float, float>::process_sink_orphan(Graph<float, float, float>::node*)
0.00 14.58 0.00 21293 0.00 27.35 Configuration::operator=(Configuration const&)
0.00 14.58 0.00 14203 0.00 0.23 Configuration::~Configuration()
0.00 14.58 0.00 14196 0.00 0.00 Configuration::get_nb_Ellipses()
0.00 14.58 0.00 7097 0.00 34.30 Configuration::Configuration(Ellips const&, int, double, int)
0.00 14.58 0.00 7097 0.00 0.00 Graph<float, float, float>::maxflow_init()
0.00 14.58 0.00 7097 0.00 0.00 Graph<float, float, float>::reset()
0.00 14.58 0.00 2406 0.00 0.00 Ellips::get_a() const
0.00 14.58 0.00 2406 0.00 0.00 Ellips::get_b() const
0.00 14.58 0.00 2406 0.00 0.00 Ellips::get_theta() const
0.00 14.58 0.00 1137 0.00 0.00 Graph<float, float, float>::process_source_orphan(Graph<float, float, float>::node*)
0.00 14.58 0.00 7 0.00 38.00 Configuration::Configuration(Configuration const&)
0.00 14.58 0.00 3 0.00 0.32 Configuration::Configuration()
0.00 14.58 0.00 2 0.00 0.00 min_max_val(_IplImage*, double&, double&)
0.00 14.58 0.00 1 0.00 0.00 convert_char_to_double(_IplImage*, double*)
0.00 14.58 0.00 1 0.00 0.00 Graph<float, float, float>::reallocate_nodes(int)
0.00 14.58 0.00 1 0.00 0.00 Graph<float, float, float>::Graph(int, int, void (*)(char*))
And here is the one I get with OpenMP (The code is a recursive algorithm with no real "ending", the two profiles have been obtained after about 7000 iterations of the main loop).
36.57 4.45 4.45 __tcf_0
25.72 7.58 3.13 86434458 0.04 0.04 is_neighbor(int, int, int, int, double)
12.41 9.09 1.51 _fu51___ZSt4cout
7.97 10.06 0.97 5646276 0.17 0.17 Ellips::Ellips()
4.35 10.59 0.53 4020048 0.13 0.13 intersect(Ellips&, Ellips&)
2.47 10.89 0.30 dgemv_
1.73 11.10 0.21 ddot_
1.64 11.30 0.20 141852099 0.00 0.00 Configuration::get_position(int)
1.15 11.44 0.14 7038 19.89 164.95 Configuration::Configuration(double, double, int, int, int, int, double*, double)
1.07 11.57 0.13 589659 0.22 0.22 Ellips::Ellips(double, double, int, int)
0.99 11.69 0.12 __tcf_0
0.90 11.80 0.11 422280 0.26 0.33 Ellips::data_fiting(double*, int, int, double) const
0.74 11.89 0.09 3208793 0.03 0.03 Ellips::operator=(Ellips const&)
0.41 11.94 0.05 ceil
0.25 11.97 0.03 422280 0.07 0.07 shift_cost_exp1(double, double)
0.25 12.00 0.03 GOMP_parallel_end
0.25 12.03 0.03 _fu53___ZNSs4_Rep20_S_empty_rep_storageE
0.16 12.05 0.02 21110 0.95 32.56 Configuration::operator=(Configuration const&)
0.16 12.07 0.02 7036 2.84 2.84 Graph<float, float, float>::maxflow(bool, Block<int>*)
0.16 12.09 0.02 _fu32___ZSt4cout
0.16 12.11 0.02 daxpy_
0.16 12.13 0.02 dnrm2_
0.08 12.14 0.01 1171018 0.01 0.04 Configuration::add_Ellips(Ellips const&, int, double)
0.08 12.15 0.01 GOMP_parallel_start
0.08 12.16 0.01 dcopy_
0.08 12.17 0.01 dgemm_
0.00 12.17 0.00 432088679 0.00 0.00 Configuration::save_config(std::string)
0.00 12.17 0.00 7813683 0.00 0.00 Configuration::get_Ellips(int)
0.00 12.17 0.00 6235383 0.00 0.00 Ellips::~Ellips()
0.00 12.17 0.00 4360587 0.00 0.00 Configuration::get_data_fit(int)
0.00 12.17 0.00 1187310 0.00 0.00 Ellips::data_fiting(double, double*, double*, double, int, int, double) const
0.00 12.17 0.00 1163572 0.00 0.00 Ellips::get_cx() const
0.00 12.17 0.00 1147536 0.00 0.00 Ellips::get_cy() const
0.00 12.17 0.00 35748 0.00 0.00 Graph<float, float, float>::augment(Graph<float, float, float>::arc*)
0.00 12.17 0.00 33436 0.00 0.00 Graph<float, float, float>::process_sink_orphan(Graph<float, float, float>::node*)
0.00 12.17 0.00 14081 0.00 0.00 Configuration::~Configuration()
0.00 12.17 0.00 14074 0.00 0.00 Configuration::get_nb_Ellipses()
0.00 12.17 0.00 7036 0.00 39.10 Configuration::Configuration(Ellips const&, int, double, int)
0.00 12.17 0.00 7036 0.00 0.00 Graph<float, float, float>::maxflow_init()
0.00 12.17 0.00 7036 0.00 0.00 Graph<float, float, float>::reset()
0.00 12.17 0.00 2424 0.00 0.00 Ellips::get_a() const
0.00 12.17 0.00 2424 0.00 0.00 Ellips::get_b() const
0.00 12.17 0.00 2424 0.00 0.00 Ellips::get_theta() const
0.00 12.17 0.00 2355 0.00 0.00 Graph<float, float, float>::process_source_orphan(Graph<float, float, float>::node*)
0.00 12.17 0.00 7 0.00 44.91 Configuration::Configuration(Configuration const&)
0.00 12.17 0.00 3 0.00 0.37 Configuration::Configuration()
0.00 12.17 0.00 2 0.00 0.00 min_max_val(_IplImage*, double&, double&)
0.00 12.17 0.00 1 0.00 0.00 convert_char_to_double(_IplImage*, double*)
0.00 12.17 0.00 1 0.00 0.00 Graph<float, float, float>::reallocate_nodes(int)
0.00 12.17 0.00 1 0.00 0.00 Graph<float, float, float>::Graph(int, int, void (*)(char*))
Is there a problem with how I'm using the profiler ? Or does this come from the code itself ? It takes about 12 seconds to complete 1000 iterations with OpenMP, whereas it takes about 31 seconds with OpenMP (using omp_get_wtime() and not clock())

Extracting specific lines of data from a log file

I'm looking to extract and print a specific line from a table I have in a long log file. It looks something like this:
******************************************************************************
XSCALE (VERSION July 4, 2012) 4-Jun-2013
******************************************************************************
Author: Wolfgang Kabsch
Copy licensed until 30-Jun-2013 to
academic users for non-commercial applications
No redistribution.
******************************************************************************
CONTROL CARDS
******************************************************************************
MAXIMUM_NUMBER_OF_PROCESSORS=16
RESOLUTION_SHELLS= 20 10 6 4 3 2.5 2.0 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8
MINIMUM_I/SIGMA=4.0
OUTPUT_FILE=fae-ip.ahkl
INPUT_FILE= /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
THE DATA COLLECTION STATISTICS REPORTED BELOW ASSUMES:
SPACE_GROUP_NUMBER= 97
UNIT_CELL_CONSTANTS= 128.28 128.28 181.47 90.000 90.000 90.000
***** 16 EQUIVALENT POSITIONS IN SPACE GROUP # 97 *****
If x',y',z' is an equivalent position to x,y,z, then
x'=x*ML(1)+y*ML( 2)+z*ML( 3)+ML( 4)/12.0
y'=x*ML(5)+y*ML( 6)+z*ML( 7)+ML( 8)/12.0
z'=x*ML(9)+y*ML(10)+z*ML(11)+ML(12)/12.0
# 1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0 0 0 1 0 0 0 0 1 0
2 -1 0 0 0 0 -1 0 0 0 0 1 0
3 -1 0 0 0 0 1 0 0 0 0 -1 0
4 1 0 0 0 0 -1 0 0 0 0 -1 0
5 0 1 0 0 1 0 0 0 0 0 -1 0
6 0 -1 0 0 -1 0 0 0 0 0 -1 0
7 0 -1 0 0 1 0 0 0 0 0 1 0
8 0 1 0 0 -1 0 0 0 0 0 1 0
9 1 0 0 6 0 1 0 6 0 0 1 6
10 -1 0 0 6 0 -1 0 6 0 0 1 6
11 -1 0 0 6 0 1 0 6 0 0 -1 6
12 1 0 0 6 0 -1 0 6 0 0 -1 6
13 0 1 0 6 1 0 0 6 0 0 -1 6
14 0 -1 0 6 -1 0 0 6 0 0 -1 6
15 0 -1 0 6 1 0 0 6 0 0 1 6
16 0 1 0 6 -1 0 0 6 0 0 1 6
ALL DATA SETS WILL BE SCALED TO /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
******************************************************************************
READING INPUT REFLECTION DATA FILES
******************************************************************************
DATA MEAN REFLECTIONS INPUT FILE NAME
SET# INTENSITY ACCEPTED REJECTED
1 0.1358E+03 1579957 0 /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF IMAGE NUMBER & RESOLUTION
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 720
DEGREES OF FREEDOM OF CHI^2 FIT 357222.9
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.024
NUMBER OF CYCLES CARRIED OUT 4
CORRECTION FACTORS for visual inspection by XDS-Viewer DECAY_001.cbf
XMIN= 0.6 XMAX= 1799.3 NXBIN= 36
YMIN= 0.00049 YMAX= 0.44483 NYBIN= 20
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF X (fast) & Y(slow) IN THE DETECTOR PLANE
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 7921
DEGREES OF FREEDOM OF CHI^2 FIT 356720.6
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.023
NUMBER OF CYCLES CARRIED OUT 3
CORRECTION FACTORS for visual inspection by XDS-Viewer MODPIX_001.cbf
XMIN= 5.4 XMAX= 2457.6 NXBIN= 89
YMIN= 40.0 YMAX= 2516.7 NYBIN= 89
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION FACTORS AS FUNCTION OF IMAGE NUMBER & DETECTOR SURFACE POSITION
******************************************************************************
RECIPROCAL CORRECTION FACTORS FOR INPUT DATA SETS MERGED TO
OUTPUT FILE: fae-ip.ahkl
THE CALCULATIONS ASSUME FRIEDEL'S_LAW= TRUE
TOTAL NUMBER OF CORRECTION FACTORS DEFINED 468
DEGREES OF FREEDOM OF CHI^2 FIT 357286.9
CHI^2-VALUE OF FIT OF CORRECTION FACTORS 1.022
NUMBER OF CYCLES CARRIED OUT 3
CORRECTION FACTORS for visual inspection by XDS-Viewer ABSORP_001.cbf
XMIN= 0.6 XMAX= 1799.3 NXBIN= 36
DETECTOR_SURFACE_POSITION= 1232 1278
DETECTOR_SURFACE_POSITION= 1648 1699
DETECTOR_SURFACE_POSITION= 815 1699
DETECTOR_SURFACE_POSITION= 815 858
DETECTOR_SURFACE_POSITION= 1648 858
DETECTOR_SURFACE_POSITION= 2174 1673
DETECTOR_SURFACE_POSITION= 1622 2230
DETECTOR_SURFACE_POSITION= 841 2230
DETECTOR_SURFACE_POSITION= 289 1673
DETECTOR_SURFACE_POSITION= 289 884
DETECTOR_SURFACE_POSITION= 841 326
DETECTOR_SURFACE_POSITION= 1622 326
DETECTOR_SURFACE_POSITION= 2174 884
NUMBER OF REFLECTIONS USED FOR DETERMINING CORRECTION FACTORS 396046
******************************************************************************
CORRECTION PARAMETERS FOR THE STANDARD ERROR OF REFLECTION INTENSITIES
******************************************************************************
The variance v0(I) of the intensity I obtained from counting statistics is
replaced by v(I)=a*(v0(I)+b*I^2). The model parameters a, b are chosen to
minimize the discrepancies between v(I) and the variance estimated from
sample statistics of symmetry related reflections. This model implicates
an asymptotic limit ISa=1/SQRT(a*b) for the highest I/Sigma(I) that the
experimental setup can produce (Diederichs (2010) Acta Cryst D66, 733-740).
Often the value of ISa is reduced from the initial value ISa0 due to systematic
errors showing up by comparison with other data sets in the scaling procedure.
(ISa=ISa0=-1 if v0 is unknown for a data set.)
a b ISa ISa0 INPUT DATA SET
1.086E+00 1.420E-03 25.46 29.00 /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
FACTOR TO PLACE ALL DATA SETS TO AN APPROXIMATE ABSOLUTE SCALE 0.4178E+04
(ASSUMING A PROTEIN WITH 50% SOLVENT)
******************************************************************************
STATISTICS OF SCALED OUTPUT DATA SET : fae-ip.ahkl
FILE TYPE: XDS_ASCII MERGE=FALSE FRIEDEL'S_LAW=TRUE
186 OUT OF 1579957 REFLECTIONS REJECTED
1579771 REFLECTIONS ON OUTPUT FILE
******************************************************************************
DEFINITIONS:
R-FACTOR
observed = (SUM(ABS(I(h,i)-I(h))))/(SUM(I(h,i)))
expected = expected R-FACTOR derived from Sigma(I)
COMPARED = number of reflections used for calculating R-FACTOR
I/SIGMA = mean of intensity/Sigma(I) of unique reflections
(after merging symmetry-related observations)
Sigma(I) = standard deviation of reflection intensity I
estimated from sample statistics
R-meas = redundancy independent R-factor (intensities)
Diederichs & Karplus (1997), Nature Struct. Biol. 4, 269-275.
CC(1/2) = percentage of correlation between intensities from
random half-datasets. Correlation significant at
the 0.1% level is marked by an asterisk.
Karplus & Diederichs (2012), Science 336, 1030-33
Anomal = percentage of correlation between random half-sets
Corr of anomalous intensity differences. Correlation
significant at the 0.1% level is marked.
SigAno = mean anomalous difference in units of its estimated
standard deviation (|F(+)-F(-)|/Sigma). F(+), F(-)
are structure factor estimates obtained from the
merged intensity observations in each parity class.
Nano = Number of unique reflections used to calculate
Anomal_Corr & SigAno. At least two observations
for each (+ and -) parity are required.
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr
20.00 557 66 74 89.2% 2.7% 3.0% 557 58.75 2.9% 100.0* 45 1.674 25
10.00 5018 417 417 100.0% 2.4% 3.1% 5018 75.34 2.6% 100.0* 2 0.812 276
6.00 18352 1583 1584 99.9% 2.8% 3.3% 18351 65.55 2.9% 100.0* 11* 0.914 1248
4.00 59691 4640 4640 100.0% 3.2% 3.5% 59690 64.96 3.4% 100.0* 4 0.857 3987
3.00 112106 8821 8822 100.0% 4.4% 4.4% 112102 50.31 4.6% 99.9* -3 0.844 7906
2.50 147954 11023 11023 100.0% 8.7% 8.6% 147954 29.91 9.1% 99.8* 0 0.829 10096
2.00 332952 24698 24698 100.0% 21.4% 21.6% 332949 14.32 22.3% 99.2* 1 0.804 22992
1.90 106645 8382 8384 100.0% 56.5% 57.1% 106645 5.63 58.8% 94.7* -2 0.767 7886
1.80 138516 10342 10343 100.0% 86.8% 87.0% 138516 3.64 90.2% 87.9* -2 0.762 9741
1.70 175117 12897 12899 100.0% 140.0% 140.1% 175116 2.15 145.4% 69.6* -2 0.732 12188
1.60 209398 16298 16304 100.0% 206.1% 208.5% 209397 1.35 214.6% 48.9* -2 0.693 15466
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
1.40 33 27 27248 0.1% 42.6% 112.7% 12 0.40 60.3% 88.2 0 0.000 0
1.30 0 0 36205 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.20 0 0 49238 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.10 0 0 68746 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
1.00 0 0 98884 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
0.90 0 0 147505 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
0.80 0 0 230396 0.0% -99.9% -99.9% 0 -99.00 -99.9% 0.0 0 0.000 0
total 1579771 119964 778303 15.4% 12.8% 13.1% 1579647 14.33 13.4% 99.9* -1 0.755 111306
========== STATISTICS OF INPUT DATA SET ==========
R-FACTORS FOR INTENSITIES OF DATA SET /dls/sci-scratch/Sam/FC59251/fr6_1/XDS_ASCII.HKL
RESOLUTION R-FACTOR R-FACTOR COMPARED
LIMIT observed expected
20.00 2.7% 3.0% 557
10.00 2.4% 3.1% 5018
6.00 2.8% 3.3% 18351
4.00 3.2% 3.5% 59690
3.00 4.4% 4.4% 112102
2.50 8.7% 8.6% 147954
2.00 21.4% 21.6% 332949
1.90 56.5% 57.1% 106645
1.80 86.8% 87.0% 138516
1.70 140.0% 140.1% 175116
1.60 206.1% 208.5% 209397
1.50 333.4% 342.1% 273340
1.40 42.6% 112.7% 12
1.30 -99.9% -99.9% 0
1.20 -99.9% -99.9% 0
1.10 -99.9% -99.9% 0
1.00 -99.9% -99.9% 0
0.90 -99.9% -99.9% 0
0.80 -99.9% -99.9% 0
total 12.8% 13.1% 1579647
******************************************************************************
WILSON STATISTICS OF SCALED DATA SET: fae-ip.ahkl
******************************************************************************
Data is divided into resolution shells and a straight line
A - 2*B*SS is fitted to log<I>, where
RES = mean resolution (Angstrom) in shell
SS = mean of (sin(THETA)/LAMBDA)**2 in shell
<I> = mean reflection intensity in shell
BO = (A - log<I>)/(2*SS)
# = number of reflections in resolution shell
WILSON LINE (using all data) : A= 14.997 B= 29.252 CORRELATION= 0.99
# RES SS <I> log(<I>) BO
1667 8.445 0.004 2.3084E+06 14.652 49.2
2798 5.260 0.009 1.5365E+06 14.245 41.6
3547 4.106 0.015 2.0110E+06 14.514 16.3
4147 3.480 0.021 1.2910E+06 14.071 22.4
4688 3.073 0.026 7.3586E+05 13.509 28.1
5154 2.781 0.032 4.6124E+05 13.042 30.3
5568 2.560 0.038 3.1507E+05 12.661 30.6
5966 2.384 0.044 2.4858E+05 12.424 29.2
6324 2.240 0.050 1.8968E+05 12.153 28.5
6707 2.119 0.056 1.3930E+05 11.844 28.3
7030 2.016 0.062 9.1378E+04 11.423 29.0
7331 1.926 0.067 5.4413E+04 10.904 30.4
7664 1.848 0.073 3.5484E+04 10.477 30.9
7934 1.778 0.079 2.4332E+04 10.100 31.0
8193 1.716 0.085 1.8373E+04 9.819 30.5
8466 1.660 0.091 1.4992E+04 9.615 29.7
8743 1.609 0.097 1.1894E+04 9.384 29.1
9037 1.562 0.102 9.4284E+03 9.151 28.5
9001 1.520 0.108 8.3217E+03 9.027 27.6
HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF CENTRIC DATA
AS COMPARED WITH THEORETICAL VALUES. (EXPECTED: 1.00)
# RES <I**2>/ <I**3>/ <I**4>/
3<I>**2 15<I>**3 105<I>**4
440 8.445 0.740 0.505 0.294
442 5.260 0.762 0.733 0.735
442 4.106 0.888 0.788 0.717
439 3.480 1.339 1.733 2.278
438 3.073 1.168 1.259 1.400
440 2.781 1.215 1.681 2.269
438 2.560 1.192 1.603 2.405
450 2.384 1.117 1.031 0.891
432 2.240 1.214 1.567 2.173
438 2.119 0.972 0.992 0.933
445 2.016 1.029 1.019 0.986
441 1.926 1.603 1.701 1.554
440 1.848 1.544 1.871 2.076
436 1.778 0.927 0.661 0.435
444 1.716 1.134 1.115 1.197
440 1.660 1.271 1.618 2.890
436 1.609 1.424 1.045 0.941
448 1.562 1.794 1.447 1.423
426 1.520 2.517 1.496 2.099
8355 overall 1.253 1.255 1.455
HIGHER ORDER MOMENTS OF WILSON DISTRIBUTION OF ACENTRIC DATA
AS COMPARED WITH THEORETICAL VALUES. (EXPECTED: 1.00)
# RES <I**2>/ <I**3>/ <I**4>/
2<I>**2 6<I>**3 24<I>**4
1227 8.445 1.322 1.803 2.340
2356 5.260 1.167 1.420 1.789
3105 4.106 1.010 1.046 1.100
3708 3.480 1.055 1.262 1.592
4250 3.073 0.999 1.083 1.375
4714 2.781 1.061 1.232 1.591
5130 2.560 1.049 1.178 1.440
5516 2.384 1.025 1.117 1.290
5892 2.240 1.001 1.058 1.230
6269 2.119 1.060 1.140 1.233
6585 2.016 1.109 1.344 1.709
6890 1.926 1.028 1.100 1.222
7224 1.848 1.060 1.150 1.348
7498 1.778 1.143 1.309 1.655
7749 1.716 1.182 1.299 1.549
8026 1.660 1.286 1.376 1.538
8307 1.609 1.419 1.481 1.707
8589 1.562 1.663 1.750 2.119
8575 1.520 2.271 2.172 5.088
111610 overall 1.253 1.354 1.804
======= CUMULATIVE INTENSITY DISTRIBUTION =======
DEFINITIONS:
<I> = mean reflection intensity
Na(Z)exp = expected number of acentric reflections with I <= Z*<I>
Na(Z)obs = observed number of acentric reflections with I <= Z*<I>
Nc(Z)exp = expected number of centric reflections with I <= Z*<I>
Nc(Z)obs = observed number of centric reflections with I <= Z*<I>
Nc(Z)obs/Nc(Z)exp versus resolution and Z (0.1-1.0)
# RES 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
440 8.445 0.75 0.95 0.98 1.00 0.98 0.99 1.00 1.00 1.02 1.02
442 5.260 1.18 1.11 1.09 1.09 1.07 1.08 1.08 1.08 1.07 1.06
442 4.106 0.97 1.01 0.98 0.97 0.96 0.94 0.92 0.91 0.92 0.94
439 3.480 0.91 0.88 0.91 0.91 0.89 0.90 0.90 0.89 0.89 0.93
438 3.073 0.92 0.92 0.90 0.93 0.94 0.99 1.02 0.99 0.96 0.96
440 2.781 0.98 1.01 1.02 1.05 1.04 1.03 1.04 1.02 1.01 1.01
438 2.560 1.02 1.10 1.05 1.03 1.01 1.03 1.04 1.01 1.04 1.02
450 2.384 0.78 0.93 0.92 0.93 0.89 0.89 0.92 0.95 0.96 0.95
432 2.240 0.69 0.82 0.84 0.86 0.91 0.92 0.93 0.94 0.95 0.95
438 2.119 0.75 0.87 0.95 1.02 1.09 1.09 1.12 1.12 1.10 1.08
445 2.016 0.86 0.86 0.87 0.90 0.91 0.93 0.98 0.99 1.00 1.00
441 1.926 0.88 0.79 0.79 0.81 0.82 0.84 0.85 0.85 0.86 0.86
440 1.848 1.00 0.89 0.85 0.83 0.85 0.85 0.88 0.90 0.90 0.92
436 1.778 1.03 0.87 0.79 0.79 0.80 0.84 0.85 0.87 0.90 0.92
444 1.716 1.09 0.85 0.81 0.78 0.80 0.80 0.81 0.81 0.84 0.85
440 1.660 1.27 1.01 0.93 0.88 0.85 0.84 0.84 0.85 0.88 0.91
436 1.609 1.34 1.00 0.89 0.83 0.80 0.80 0.80 0.81 0.80 0.83
448 1.562 1.39 1.09 0.93 0.86 0.81 0.78 0.77 0.79 0.78 0.78
426 1.520 1.38 1.03 0.88 0.83 0.82 0.80 0.78 0.76 0.75 0.74
8355 overall 1.01 0.95 0.92 0.91 0.91 0.91 0.92 0.92 0.93 0.93
Na(Z)obs/Na(Z)exp versus resolution and Z (0.1-1.0)
# RES 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
1227 8.445 1.10 1.22 1.21 1.21 1.14 1.10 1.12 1.10 1.11 1.09
2356 5.260 1.15 1.10 1.09 1.03 1.03 1.03 1.01 1.01 1.01 1.00
3105 4.106 0.91 0.96 0.99 1.01 1.02 1.00 1.00 0.99 0.99 1.00
3708 3.480 0.93 0.97 1.00 1.06 1.05 1.04 1.04 1.04 1.04 1.05
4250 3.073 0.94 1.02 1.01 1.00 1.01 1.00 1.00 1.01 1.02 1.02
4714 2.781 1.11 1.04 1.02 1.02 1.02 1.01 1.01 1.01 1.00 1.00
5130 2.560 1.00 1.10 1.06 1.03 1.01 1.02 1.01 1.01 1.01 1.02
5516 2.384 1.09 1.08 1.05 1.04 1.04 1.02 1.01 1.01 1.01 1.01
5892 2.240 0.98 0.99 1.00 1.01 1.01 1.01 1.00 1.00 1.00 1.00
6269 2.119 1.14 1.04 1.02 1.00 1.00 1.00 1.01 1.02 1.02 1.01
6585 2.016 1.17 1.02 1.01 1.02 1.02 1.03 1.02 1.02 1.02 1.02
6890 1.926 1.35 1.07 1.00 0.99 1.00 1.01 1.01 1.00 1.00 1.01
7224 1.848 1.52 1.11 1.01 0.97 0.96 0.98 0.98 0.98 0.98 0.99
7498 1.778 1.80 1.22 1.03 0.97 0.95 0.94 0.95 0.95 0.95 0.96
7749 1.716 2.01 1.28 1.07 0.99 0.94 0.92 0.92 0.92 0.93 0.93
8026 1.660 2.31 1.41 1.13 1.01 0.95 0.92 0.90 0.89 0.89 0.89
8307 1.609 2.62 1.54 1.19 1.04 0.95 0.90 0.88 0.87 0.86 0.87
8589 1.562 2.94 1.69 1.29 1.10 1.00 0.93 0.89 0.86 0.85 0.85
8575 1.520 3.14 1.78 1.34 1.13 1.01 0.93 0.88 0.85 0.83 0.83
111610 overall 1.73 1.24 1.09 1.03 0.99 0.97 0.96 0.96 0.96 0.96
List of 33 reflections *NOT* obeying Wilson distribution (Z> 10.0)
h k l RES Z Intensity Sigma
72 11 61 1.52 17.34 0.2886E+06 0.2367E+05 "alien"
67 53 6 1.50 15.85 0.2638E+06 0.1128E+06 "alien"
35 10 25 3.17 14.39 0.2118E+08 0.2364E+06 "alien"
46 17 99 1.50 14.16 0.2357E+06 0.9588E+05 "alien"
34 32 2 2.75 13.44 0.1239E+08 0.1279E+06 "alien"
79 6 15 1.60 13.10 0.3117E+06 0.2477E+05 "alien"
61 20 33 1.88 12.54 0.8900E+06 0.3054E+05 "alien"
44 4 48 2.30 12.38 0.4695E+07 0.6072E+05 "alien"
66 25 19 1.79 11.89 0.5788E+06 0.2739E+05 "alien"
66 25 11 1.81 11.88 0.5781E+06 0.2771E+05 "alien"
60 43 61 1.50 11.77 0.1959E+06 0.9769E+05 "alien"
72 11 17 1.74 11.64 0.4278E+06 0.2619E+05 "alien"
80 24 26 1.50 11.41 0.1899E+06 0.9793E+05 "alien"
41 21 26 2.59 11.09 0.6988E+07 0.7945E+05 "alien"
44 18 20 2.59 11.08 0.6982E+07 0.7839E+05 "alien"
23 3 62 2.59 11.06 0.6971E+07 0.9154E+05 "alien"
69 7 22 1.80 11.06 0.5383E+06 0.2564E+05 "alien"
73 10 15 1.72 10.98 0.4036E+06 0.2356E+05 "alien"
70 17 35 1.68 10.96 0.3286E+06 0.2415E+05 "alien"
57 24 41 1.88 10.91 0.7746E+06 0.2842E+05 "alien"
82 24 6 1.50 10.74 0.1787E+06 0.1019E+06 "alien"
69 25 62 1.50 10.67 0.1775E+06 0.8689E+05 "alien"
24 20 44 2.91 10.45 0.9641E+07 0.1017E+06 "alien"
66 43 5 1.63 10.37 0.2468E+06 0.2294E+05 "alien"
81 4 29 1.53 10.36 0.1725E+06 0.2364E+05 "alien"
60 40 26 1.72 10.32 0.3792E+06 0.2578E+05 "alien"
39 18 57 2.18 10.24 0.3885E+07 0.5573E+05 "alien"
70 41 15 1.57 10.19 0.1922E+06 0.2281E+05 "alien"
55 36 41 1.79 10.16 0.4942E+06 0.2967E+05 "alien"
37 4 81 1.88 10.15 0.7202E+06 0.3357E+05 "alien"
56 27 5 2.06 10.14 0.1854E+07 0.3569E+05 "alien"
44 39 29 2.06 10.09 0.1844E+07 0.3805E+05 "alien"
65 46 29 1.56 10.06 0.1898E+06 0.2270E+05 "alien"
List of 33 reflections *NOT* obeying Wilson distribution (sorted by resolution)
Ice rings could occur at (Angstrom):
3.897,3.669,3.441, 2.671,2.249,2.072, 1.948,1.918,1.883,1.721
h k l RES Z Intensity Sigma
82 24 6 1.50 10.74 0.1787E+06 0.1019E+06
67 53 6 1.50 15.85 0.2638E+06 0.1128E+06
80 24 26 1.50 11.41 0.1899E+06 0.9793E+05
60 43 61 1.50 11.77 0.1959E+06 0.9769E+05
69 25 62 1.50 10.67 0.1775E+06 0.8689E+05
46 17 99 1.50 14.16 0.2357E+06 0.9588E+05
72 11 61 1.52 17.34 0.2886E+06 0.2367E+05
81 4 29 1.53 10.36 0.1725E+06 0.2364E+05
65 46 29 1.56 10.06 0.1898E+06 0.2270E+05
70 41 15 1.57 10.19 0.1922E+06 0.2281E+05
79 6 15 1.60 13.10 0.3117E+06 0.2477E+05
66 43 5 1.63 10.37 0.2468E+06 0.2294E+05
70 17 35 1.68 10.96 0.3286E+06 0.2415E+05
73 10 15 1.72 10.98 0.4036E+06 0.2356E+05
60 40 26 1.72 10.32 0.3792E+06 0.2578E+05
72 11 17 1.74 11.64 0.4278E+06 0.2619E+05
66 25 19 1.79 11.89 0.5788E+06 0.2739E+05
55 36 41 1.79 10.16 0.4942E+06 0.2967E+05
69 7 22 1.80 11.06 0.5383E+06 0.2564E+05
66 25 11 1.81 11.88 0.5781E+06 0.2771E+05
61 20 33 1.88 12.54 0.8900E+06 0.3054E+05
57 24 41 1.88 10.91 0.7746E+06 0.2842E+05
37 4 81 1.88 10.15 0.7202E+06 0.3357E+05
56 27 5 2.06 10.14 0.1854E+07 0.3569E+05
44 39 29 2.06 10.09 0.1844E+07 0.3805E+05
39 18 57 2.18 10.24 0.3885E+07 0.5573E+05
44 4 48 2.30 12.38 0.4695E+07 0.6072E+05
44 18 20 2.59 11.08 0.6982E+07 0.7839E+05
41 21 26 2.59 11.09 0.6988E+07 0.7945E+05
23 3 62 2.59 11.06 0.6971E+07 0.9154E+05
34 32 2 2.75 13.44 0.1239E+08 0.1279E+06
24 20 44 2.91 10.45 0.9641E+07 0.1017E+06
35 10 25 3.17 14.39 0.2118E+08 0.2364E+06
cpu time used by XSCALE 25.9 sec
elapsed wall-clock time 28.1 sec
I would like to extract the second last line where the 11th column has a number followed by an asterisk (xy.z*) and the lines above and below that. That is from the table with SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION Above it.
For example in this table the line I'm looking for would contain "23.2*" from the 11th column (CC(1/2)). I would like the second last with an asterisk because the last would be the line that starts with total, and this was a lot easier to extract with a simple grep command.
So the expected output for the code in this case would be to print the lines:
1.60 209398 16298 16304 100.0% 206.1% 208.5% 209397 1.35 214.6% 48.9* -2 0.693 15466
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
1.40 33 27 27248 0.1% 42.6% 112.7% 12 0.40 60.3% 88.2 0 0.000 0
And so on for all the different possible positions of the asterisk in the table.
In my previous question I recieved the answer
sed -n '/LIMIT/,/=/{/^\s*\(\S*\s*\)\{10\}[0-9.-]*\*/H;x;s/^.*\n\(.*\n.*\)$/\1/;x;/=/{x;P;q}}' file
Which worked really well (thanks Endoro) for extracting just the second last line in the 11th column with the asterisk, (which is what i asked for) but now I just need that editing slightly, or if you would rather make a whole new line, to include the lines above and below.
Here is a link to the previous question Extracting the second last line from a table using a specific number followed by an asterisk (e.g. xy.z*)
Any help would be greatly appreciated.
Sam
Code for GNU sed
sed -rn '/LIMIT/,/total/{//!H};/total/{x;s/^.*\n(.*\n)((\s+\S+){10}\s+[0-9.]+\*(\s+\S+){3}\n(\s+\S+){14}).*/\1\2/;p;q}' file
$sed -rn '/LIMIT/,/total/{//!H};/total/{x;s/^.*\n(.*\n)((\s+\S+){10}\s+[0-9.]+\*(\s+\S+){3}\n(\s+\S+){14}).*/\1\2/;p;q}' file
1.60 209398 16298 16304 100.0% 206.1% 208.5% 209397 1.35 214.6% 48.9* -2 0.693 15466
1.50 273432 20770 20893 99.4% 333.4% 342.1% 273340 0.80 346.9% 23.2* -1 0.644 19495
1.40 33 27 27248 0.1% 42.6% 112.7% 12 0.40 60.3% 88.2 0 0.000 0
A bit dirty but should work:
awk '
/^ *SUBSET OF INTENSITY/,/^ *total/ {
a[++i]=$0;
b[i]=$11
}
END {
for(o=i-1;o>=0;o--)
if (b[o]~/\*/) {
print a[o-1]"\n"a[o]"\n"a[o+1]
break
}
}' log