Year to date vs Year to date last year | Pandas - python-2.7

I would like to calculate the Year to date (YTD) value for this year and compare it to the same period last year in Pandas. My df looks like this:
Month Product A Product B
2015-01-01 24 62
2015-02-01 46 24
2015-03-01 30 70
2015-04-01 26 51
2015-05-01 34 42
2015-06-01 45 35
2015-07-01 25 13
2015-08-01 98 95
2015-09-01 6 81
2015-10-01 93 38
2015-11-01 98 59
2015-12-01 98 1
2016-01-01 67 42
2016-02-01 72 34
2016-03-01 7 6
2016-04-01 19 24
2016-05-01 82 38
2016-06-01 15 79
2016-07-01 49 83
2016-08-01 97 56
The two values i am after for product A are
YTD = 408 and YTD SPLY = 328 (Sum Jan-Aug 2016, Sum Jan-Aug 2015).
When a new month is added to the df, i would like the formula to calculate Jan-Sep and so on.
Any ideas how to proceed?

Not exactly sure what you want but it looks like you want to take the cumulative sum for each year.
df[['A_cumsum', 'B_cumsum']] = df.resample('A', on='Month').transform('cumsum')
Month Product A Product B A_cumsum B_cumsum
0 2015-01-01 24 62 24 62
1 2015-02-01 46 24 70 86
2 2015-03-01 30 70 100 156
3 2015-04-01 26 51 126 207
4 2015-05-01 34 42 160 249
5 2015-06-01 45 35 205 284
6 2015-07-01 25 13 230 297
7 2015-08-01 98 95 328 392
8 2015-09-01 6 81 334 473
9 2015-10-01 93 38 427 511
10 2015-11-01 98 59 525 570
11 2015-12-01 98 1 623 571
12 2016-01-01 67 42 67 42
13 2016-02-01 72 34 139 76
14 2016-03-01 7 6 146 82
15 2016-04-01 19 24 165 106
16 2016-05-01 82 38 247 144
17 2016-06-01 15 79 262 223
18 2016-07-01 49 83 311 306
19 2016-08-01 97 56 408 362

Related

Why CGAL isotropic_remeshing generates self-intersections?

It's my first time using CGAL and I'm trying to use CGAL isotropic_remeshing following CGAL guide and examples.
typedef CGAL::Simple_cartesian<double> geometric_kernel;
typedef CGAL::Surface_mesh<geometric_kernel::Point_3> triangle_mesh;
typedef boost::graph_traits<triangle_mesh>::edge_descriptor edge_descriptor;
typedef boost::graph_traits<triangle_mesh>::halfedge_descriptor halfedge_descriptor;
struct halfedge2edge
{
halfedge2edge(const triangle_mesh& m, std::vector<edge_descriptor>& edges)
: m_mesh(m), m_edges(edges)
{}
void operator()(const halfedge_descriptor& h) const
{
m_edges.push_back(edge(h, m_mesh));
}
const triangle_mesh& m_mesh;
std::vector<edge_descriptor>& m_edges;
};
void remesh(std::string fname) {
fmt::print("reading input file {}\n", fname);
std::ifstream input(fname);
bool b = CGAL::IO::read_PLY(input, _mesh);
input.close();
if (!b) throw std::runtime_error("cannot read input file");
// this returns false (initial mesh is ok)
fmt::print("Self-intersaction: {}", CGAL::Polygon_mesh_processing::does_self_intersect(faces(_mesh), _mesh) ? "YES\n" : "NO\n");
std::vector<edge_descriptor> border;
PMP::border_halfedges(faces(_mesh), _mesh, boost::make_function_output_iterator(halfedge2edge(_mesh, border)));
split_long_edges(border, 0.05, _mesh);
CGAL::Polygon_mesh_processing::isotropic_remeshing(faces(_mesh), 0.05, _mesh, CGAL::Polygon_mesh_processing::parameters::number_of_iterations(3).protect_constraints(true));
// this returns true ...
fmt::print("Self-intersaction: {}", CGAL::Polygon_mesh_processing::does_self_intersect(faces(_mesh), _mesh) ? "YES\n" : "NO\n");
}
and this is the PLY file:
ply
format ascii 1.0
element vertex 240
property double x
property double y
property double z
element face 416
property list uchar int vertex_indices
end_header
-0.0677552 1.75428e-11 0.359551
0.559233 -5.07748e-11 -0.00371597
-0.422648 -0.0809302 0.0767055
-0.422648 0.0809302 0.0767055
0.353976 -0.31933 -0.0764848
0.353976 0.31933 -0.0764848
0.18628 -0.105913 0.136968
0.18628 0.105913 0.136968
0.0734964 -0.042885 0.171005
0.0734964 0.042885 0.171005
0.000894007 -0.101015 0.219853
0.000894007 0.101015 0.219853
-0.317115 -0.490302 -0.14088
-0.317115 0.490302 -0.14088
0.0363557 -0.18484 0.0595056
0.0363557 0.18484 0.0595056
0.679924 -0.284836 -0.185795
0.679924 0.284836 -0.185795
-0.902296 1.60765e-11 -0.0147902
-0.334173 -0.308304 -0.0317029
-0.226906 -0.388018 -0.0664415
0.238526 -0.495303 -0.175104
0.682336 -0.249117 -0.140847
-0.527435 -0.227159 0.105247
-0.915072 -0.285449 -0.0849493
-0.588108 -0.384526 0.0274976
-0.414261 -0.558083 -0.173693
-0.709628 -0.546197 -0.209069
-0.279147 -0.551078 -0.191796
-0.441742 -0.0761666 0.0919278
0.0750773 -0.548552 -0.169267
0.151201 -0.0861054 0.142655
-0.328035 -0.0785272 0.130721
-0.188634 -0.160355 0.0781907
-0.319349 -0.157333 -0.00956133
-0.695544 -0.0781028 0.0827349
-0.207819 -0.26498 -0.0189174
-0.186662 -0.497279 -0.102856
-0.0315025 -0.481533 -0.0632797
-0.101123 -0.565625 -0.138967
-0.954181 -0.0987909 -0.0469328
-0.448314 -0.217296 0.00954906
-0.855668 -0.454036 -0.168054
-0.685034 -0.225247 0.0590258
-0.19149 -0.085893 0.23609
-0.346289 -0.446491 -0.0910963
-0.0677861 -0.240685 0.03329
-0.570304 -0.563494 -0.181845
-0.0991308 -0.373187 -0.0199879
0.0256409 -0.0662504 0.220869
0.12104 -0.443556 -0.0725518
-0.559714 -0.0601683 0.126809
0.7294 -0.0778938 -0.0619168
-0.485049 -0.500588 -0.0492544
0.0544319 -0.163186 0.072704
-0.659303 -0.466641 -0.0854264
-0.769765 -0.338041 -0.0415794
-0.0697669 -0.153351 0.179993
0.0177175 -0.341852 0.0051975
0.142156 -0.249748 0.0257109
0.395681 -0.491973 -0.246075
0.585311 -0.0507501 -0.00507005
0.579522 -0.360325 -0.214198
0.375654 -0.398838 -0.122424
-0.0771929 -0.0805348 0.321587
-0.83454 -0.127215 0.00225332
-0.445651 -0.371942 0.010079
0.584187 -0.166126 -0.0560823
0.491718 -0.270257 -0.0989297
-0.334173 0.308304 -0.0317029
-0.226906 0.388018 -0.0664415
0.238526 0.495303 -0.175104
0.682336 0.249117 -0.140847
-0.527435 0.227159 0.105247
-0.915072 0.285449 -0.0849493
-0.588108 0.384526 0.0274976
-0.414261 0.558083 -0.173693
-0.709628 0.546197 -0.209069
-0.279147 0.551078 -0.191796
-0.441742 0.0761666 0.0919278
0.0750773 0.548552 -0.169267
0.151201 0.0861054 0.142655
-0.328035 0.0785272 0.130721
-0.188634 0.160355 0.0781907
-0.319349 0.157333 -0.00956133
-0.695544 0.0781028 0.0827349
-0.207819 0.26498 -0.0189174
-0.186662 0.497279 -0.102856
-0.0315025 0.481533 -0.0632797
-0.101123 0.565625 -0.138967
-0.954181 0.0987909 -0.0469328
-0.448314 0.217296 0.00954906
-0.855668 0.454036 -0.168054
-0.685034 0.225247 0.0590258
-0.19149 0.085893 0.23609
-0.346289 0.446491 -0.0910963
-0.0677861 0.240685 0.03329
-0.570304 0.563494 -0.181845
-0.0991308 0.373187 -0.0199879
0.0256409 0.0662504 0.220869
0.12104 0.443556 -0.0725518
-0.559714 0.0601683 0.126809
0.7294 0.0778938 -0.0619168
-0.485049 0.500588 -0.0492544
0.0544319 0.163186 0.072704
-0.659303 0.466641 -0.0854264
-0.769765 0.338041 -0.0415794
-0.0697669 0.153351 0.179993
0.0177175 0.341852 0.0051975
0.142156 0.249748 0.0257109
0.395681 0.491973 -0.246075
0.585311 0.0507501 -0.00507006
0.579522 0.360325 -0.214198
0.375654 0.398838 -0.122424
-0.0771929 0.0805348 0.321587
-0.83454 0.127215 0.00225332
-0.445651 0.371942 0.010079
0.584187 0.166126 -0.0560823
0.491718 0.270257 -0.0989297
0.244787 0.44166 -0.124611
0.224863 0.422926 -0.103295
0.203555 0.403002 -0.0851674
0.181818 0.382468 -0.0666096
0.160161 0.361466 -0.0494549
0.138703 0.340243 -0.0343855
0.116587 0.315975 -0.0211481
0.0930881 0.292328 -0.010644
0.0683716 0.270062 -0.00112955
0.0437934 0.247639 0.00896137
0.0147427 0.229286 0.0184032
0.244787 -0.44166 -0.124611
0.224863 -0.422926 -0.103295
0.203555 -0.403002 -0.0851674
0.181818 -0.382468 -0.0666096
0.160161 -0.361466 -0.0494549
0.138703 -0.340243 -0.0343855
0.116587 -0.315975 -0.0211481
0.0930881 -0.292328 -0.010644
0.0683716 -0.270062 -0.00112955
0.0437934 -0.247639 0.00896137
0.0147427 -0.229286 0.0184032
0.353943 0.318951 -0.0768402
0.318176 0.310251 -0.0529674
0.28752 0.294134 -0.0220852
0.263653 0.270201 0.00932505
0.248486 0.239857 0.0406095
0.237599 0.20638 0.0706802
0.228386 0.170062 0.097771
0.220138 0.131333 0.121548
0.221814 0.0891843 0.140093
0.237727 0.0468576 0.144191
0.240837 -4.4941e-11 0.144112
0.353943 -0.318951 -0.0768402
0.318176 -0.310251 -0.0529674
0.28752 -0.294134 -0.0220852
0.263653 -0.270201 0.00932505
0.248486 -0.239857 0.0406095
0.237599 -0.20638 0.0706802
0.228386 -0.170062 0.097771
0.220138 -0.131333 0.121548
0.221814 -0.0891843 0.140093
0.237727 -0.0468576 0.144191
0.334994 0.290335 -0.0602502
0.324017 0.260279 -0.0388522
0.31833 0.233087 -0.011531
0.312891 0.206844 0.0165917
0.306376 0.17827 0.0421742
0.301065 0.146996 0.0646615
0.297445 0.113434 0.0839683
0.295181 0.07813 0.10025
0.293328 0.0406645 0.110471
0.290482 2.27582e-11 0.115974
0.334994 -0.290335 -0.0602502
0.324017 -0.260279 -0.0388522
0.31833 -0.233087 -0.011531
0.312891 -0.206844 0.0165917
0.306376 -0.17827 0.0421742
0.301065 -0.146996 0.0646615
0.297445 -0.113434 0.0839683
0.295181 -0.07813 0.10025
0.293328 -0.0406645 0.110471
0.347914 0.280814 -0.0613852
0.33858 0.251578 -0.0407904
0.333526 0.224505 -0.0155032
0.330611 0.199101 0.010933
0.326488 0.172047 0.0361431
0.321941 0.142269 0.0582371
0.318458 0.109138 0.0750352
0.316637 0.0738964 0.0871513
0.31877 0.0372044 0.0929536
0.318747 -5.36519e-12 0.0939851
0.347914 -0.280814 -0.0613852
0.33858 -0.251578 -0.0407904
0.333526 -0.224505 -0.0155032
0.330611 -0.199101 0.010933
0.326488 -0.172047 0.0361431
0.321941 -0.142269 0.0582371
0.318458 -0.109138 0.0750352
0.316637 -0.0738964 0.0871513
0.31877 -0.0372044 0.0929536
0.383912 0.288495 -0.0737723
0.401167 0.260743 -0.0567028
0.416481 0.23323 -0.0356612
0.427658 0.205483 -0.0128446
0.433461 0.176582 0.0104455
0.435967 0.146037 0.0318464
0.436745 0.113498 0.0486202
0.434922 0.0783314 0.0602004
0.432704 0.0414137 0.0672302
0.439201 4.6519e-12 0.0584096
0.383912 -0.288495 -0.0737723
0.401167 -0.260743 -0.0567028
0.416481 -0.23323 -0.0356612
0.427658 -0.205483 -0.0128446
0.433461 -0.176582 0.0104455
0.435967 -0.146037 0.0318464
0.436745 -0.113498 0.0486202
0.434922 -0.0783314 0.0602004
0.432704 -0.0414137 0.0672302
-0.399426 0.601789 -0.253601
-0.434361 0.571336 -0.18351
-0.469967 0.544585 -0.115531
-0.50467 0.50818 -0.052334
-0.524224 0.45265 0.00308497
-0.529609 0.384371 0.046803
-0.53244 0.309845 0.0804658
-0.54185 0.233414 0.106686
-0.558825 0.155064 0.120583
-0.554692 0.074207 0.125803
-0.526166 3.22982e-11 0.133583
-0.399426 -0.601789 -0.253601
-0.434361 -0.571336 -0.18351
-0.469967 -0.544585 -0.115531
-0.50467 -0.50818 -0.052334
-0.524224 -0.45265 0.00308497
-0.529609 -0.384371 0.046803
-0.53244 -0.309845 0.0804658
-0.54185 -0.233414 0.106686
-0.558825 -0.155064 0.120583
-0.554692 -0.074207 0.125803
3 221 76 220
3 131 21 130
3 33 46 57
3 2 32 82
3 82 94 83
3 37 20 45
3 235 234 66
3 2 3 79
3 82 32 94
3 39 30 38
3 12 37 45
3 50 21 131
3 189 190 208
3 2 41 34
3 13 76 103
3 77 97 219
3 94 0 114
3 124 109 143
3 86 84 83
3 47 27 230
3 234 53 45
3 107 94 114
3 26 12 53
3 18 40 65
3 27 47 55
3 37 39 38
3 25 55 234
3 20 19 45
3 237 23 238
3 160 6 159
3 36 20 48
3 40 18 90
3 74 115 106
3 35 18 65
3 56 25 43
3 227 3 91
3 18 35 85
3 19 20 36
3 32 2 34
3 229 29 79
3 3 82 84
3 3 2 82
3 13 95 87
3 46 36 48
3 21 30 60
3 36 46 33
3 44 57 64
3 30 21 50
3 140 14 57
3 210 62 68
3 132 131 63
3 59 14 139
3 54 59 158
3 57 14 54
3 3 84 91
3 223 75 224
3 149 81 150
3 64 54 10
3 215 214 67
3 104 81 147
3 158 6 31
3 18 85 115
3 85 93 115
3 75 225 224
3 115 93 106
3 106 75 105
3 84 69 91
3 80 88 100
3 69 70 95
3 87 70 98
3 69 86 70
3 15 109 128
3 95 70 87
3 96 107 129
3 100 124 123
3 13 87 78
3 80 89 88
3 71 80 100
3 80 71 110
3 206 207 111
3 61 52 111
3 141 5 142
3 118 17 112
3 24 42 56
3 55 25 56
3 232 26 53
3 42 27 56
3 27 55 56
3 40 24 65
3 43 35 65
3 37 12 28
3 230 26 231
3 47 230 231
3 231 26 232
3 47 231 232
3 55 47 233
3 47 232 233
3 12 26 28
3 26 230 28
3 233 232 53
3 53 12 45
3 55 233 234
3 233 53 234
3 25 235 236
3 45 19 66
3 28 230 39
3 37 28 39
3 25 234 235
3 234 45 66
3 235 66 236
3 43 25 236
3 85 35 51
3 23 41 238
3 66 19 236
3 236 19 41
3 236 23 237
3 43 236 237
3 23 236 41
3 239 238 29
3 32 33 44
3 38 50 58
3 24 56 65
3 56 43 65
3 90 18 115
3 74 90 115
3 35 43 238
3 43 237 238
3 238 41 29
3 35 238 239
3 51 35 239
3 41 2 29
3 51 239 29
3 29 2 79
3 41 19 34
3 19 36 34
3 34 36 33
3 32 34 33
3 51 29 229
3 85 51 101
3 101 51 229
3 101 229 79
3 93 85 227
3 85 101 228
3 227 85 228
3 228 101 79
3 3 227 79
3 227 228 79
3 86 69 84
3 84 82 83
3 107 114 11
3 83 94 107
3 38 30 50
3 130 60 63
3 50 131 132
3 154 133 153
3 133 132 153
3 59 134 154
3 63 60 62
3 21 60 130
3 20 37 48
3 37 38 48
3 48 38 58
3 46 48 58
3 46 140 57
3 138 59 139
3 139 14 140
3 58 138 139
3 46 58 140
3 58 139 140
3 134 133 154
3 59 138 137
3 134 59 135
3 50 132 133
3 135 59 136
3 50 133 134
3 58 50 135
3 50 134 135
3 136 59 137
3 58 135 136
3 138 58 137
3 58 136 137
3 195 194 213
3 152 4 210
3 4 62 210
3 59 154 155
3 4 152 153
3 59 155 156
3 155 154 172
3 131 130 63
3 211 210 68
3 4 63 62
3 63 4 153
3 132 63 153
3 153 152 172
3 154 153 172
3 156 155 173
3 155 172 173
3 156 173 174
3 192 191 210
3 191 152 210
3 157 174 175
3 156 174 157
3 62 16 68
3 22 52 67
3 193 192 211
3 192 210 211
3 193 211 212
3 194 212 213
3 211 68 212
3 68 16 67
3 0 94 44
3 94 32 44
3 0 44 64
3 33 57 44
3 0 64 49
3 14 59 54
3 64 57 54
3 6 160 31
3 158 157 175
3 59 156 157
3 10 54 31
3 59 157 158
3 6 158 159
3 161 180 151
3 31 161 151
3 54 158 31
3 49 10 31
3 8 0 49
3 64 10 49
3 0 8 9
3 8 49 31
3 108 88 98
3 86 83 96
3 99 9 81
3 31 160 161
3 109 15 104
3 114 0 99
3 0 9 99
3 11 99 81
3 114 99 11
3 15 107 104
3 107 11 104
3 104 11 81
3 81 7 147
3 8 31 151
3 149 7 81
3 7 149 148
3 146 109 147
3 160 159 178
3 193 212 194
3 158 175 176
3 159 158 177
3 158 176 177
3 16 22 67
3 213 68 214
3 159 177 178
3 212 68 213
3 214 68 67
3 196 195 214
3 195 213 214
3 196 214 215
3 150 81 151
3 198 197 216
3 196 215 197
3 161 160 179
3 160 178 179
3 9 8 151
3 161 179 180
3 197 215 216
3 216 215 67
3 81 9 151
3 1 61 111
3 150 151 170
3 199 198 217
3 198 216 217
3 218 217 61
3 199 217 218
3 217 216 67
3 67 52 61
3 1 218 61
3 217 67 61
3 218 1 209
3 170 151 171
3 151 180 171
3 150 170 169
3 148 149 168
3 149 150 169
3 199 218 190
3 190 218 209
3 208 1 111
3 208 190 209
3 1 208 209
3 167 148 168
3 168 149 169
3 186 187 205
3 147 167 166
3 189 208 207
3 188 189 207
3 187 188 206
3 204 117 118
3 165 147 166
3 205 187 206
3 205 117 204
3 206 188 207
3 207 208 111
3 111 52 102
3 205 206 117
3 206 111 117
3 117 111 102
3 97 221 220
3 93 75 106
3 106 77 92
3 74 106 92
3 97 77 105
3 77 106 105
3 226 73 225
3 91 69 225
3 93 227 226
3 227 73 226
3 73 227 91
3 73 91 225
3 75 93 225
3 93 226 225
3 105 75 223
3 105 223 222
3 224 225 116
3 225 69 116
3 95 13 103
3 223 224 116
3 95 223 116
3 69 95 116
3 223 95 103
3 97 105 222
3 103 76 221
3 221 97 222
3 103 221 222
3 223 103 222
3 88 89 87
3 88 87 98
3 76 13 78
3 87 89 78
3 219 97 220
3 76 219 220
3 219 76 78
3 89 219 78
3 128 96 129
3 83 107 96
3 109 145 144
3 128 109 127
3 107 15 129
3 15 128 129
3 70 86 98
3 96 128 108
3 108 128 127
3 143 109 144
3 145 109 146
3 109 104 147
3 7 148 147
3 145 146 164
3 148 167 147
3 146 147 165
3 127 109 126
3 108 127 126
3 124 108 125
3 109 124 125
3 125 108 126
3 109 125 126
3 119 120 113
3 110 71 119
3 163 145 164
3 144 145 163
3 86 96 98
3 96 108 98
3 108 124 100
3 88 108 100
3 122 143 142
3 124 143 123
3 121 100 122
3 121 122 142
3 122 100 123
3 143 122 123
3 71 100 119
3 100 121 120
3 119 100 120
3 110 119 113
3 182 201 200
3 143 144 163
3 182 183 201
3 164 146 165
3 186 205 204
3 183 184 202
3 185 186 204
3 184 185 203
3 5 141 200
3 143 163 162
3 201 183 202
3 200 201 118
3 202 184 203
3 117 17 118
3 203 185 204
3 203 204 118
3 142 5 113
3 120 121 113
3 121 142 113
3 142 143 162
3 141 142 162
3 113 5 112
3 181 182 200
3 141 181 200
3 202 203 118
3 201 202 118
3 17 117 72
3 117 102 72
3 5 200 112
3 200 118 112
3 110 113 112
As my program reads the PLY file containing the mesh (which is stored in _mesh and does not self-intersect), isotropic remeshing is done but, after that, CGAL does_self_intersect returns true. Is it normal? If not, What did I do wrong?
I noticed that I was using Simple_cartesian kernel, so I changed it with Exact_predicates_inexact_constructions_kernel.
Now, with those same parameters (target_edge_length = 0.05, nb_iter = 3) CGAL::Polygon_mesh_processing::does_self_intersect returns False

Pandas DataFrame: How to get a min value in a vectorized way?

I have a pandas dataframe:
import numpy
import pandas
df1 = abs((pandas.DataFrame(numpy.random.randn(20, 8))*100).astype(int))
df1.columns = list('abcdefgh')
df1.index = pandas.date_range('1/1/2014', periods=20)
How would I create a new column that will give me the minimum value of the first half of the current row and the last 3 values in the previous row?
For example, the first five rows in the created column would be:
Nan
12
4
14
21
Here is one way to do it. Basically, you need to first shift last three columns and then combine with the first 4 columns, and finally calculate the min.
import numpy
import pandas
# your data
# ===================================
numpy.random.seed(0)
df1 = abs((pandas.DataFrame(numpy.random.randn(20, 8))*100).astype(int))
df1.columns = list('abcdefgh')
df1.index = pandas.date_range('1/1/2014', periods=20)
# processing
# ===================================
df1['custom_min'] = pandas.concat([df1[df1.columns[:5]], df1[df1.columns[-3:]].shift(1)], axis=1).min(axis=1)
print(df1)
a b c d e f g h custom_min
2014-01-01 176 40 97 224 186 97 95 15 40
2014-01-02 10 41 14 145 76 12 44 33 10
2014-01-03 149 20 31 85 255 65 86 74 12
2014-01-04 226 145 4 18 153 146 15 37 4
2014-01-05 88 198 34 15 123 120 38 30 15
2014-01-06 104 142 170 195 50 43 125 77 30
2014-01-07 161 21 89 38 51 118 2 42 21
2014-01-08 6 30 63 36 67 35 81 172 2
2014-01-09 17 40 163 46 90 5 72 12 17
2014-01-10 113 123 40 68 87 57 31 5 5
2014-01-11 116 90 46 153 148 189 117 17 5
2014-01-12 107 105 40 122 20 97 35 70 17
2014-01-13 1 178 12 40 188 134 127 96 1
2014-01-14 117 194 41 74 192 148 186 90 41
2014-01-15 86 191 26 80 94 15 61 92 26
2014-01-16 37 109 29 132 69 14 43 184 15
2014-01-17 67 40 76 53 67 3 63 67 14
2014-01-18 57 20 39 109 149 43 16 63 3
2014-01-19 238 94 91 111 131 46 6 171 16
2014-01-20 74 82 9 66 112 107 114 43 6

AWK - Printing a specific pattern

I have file that looks like this
gene_id_100100 sp|Q53IZ1|ASDP_PSESP 35.81 148 90 2 13 158 6 150 6e-27 109 158 531
gene_id_100600 sp|Q49W80|Y1834_STAS1 31.31 99 63 2 1 95 279 376 7e-07 50.1 113 402
gene_id_100 sp|A7TSV7|PAN1_VANPO 36.36 44 24 1 41 80 879 922 1.9 32.3 154 1492
gene_id_10100 sp|P37348|YECE_ECOLI 32.77 177 104 6 3 172 2 170 2e-13 71.2 248 272
gene_id_101100 sp|B0U4U5|SURE_XYLFM 29.11 79 41 3 70 148 143 206 0.14 35.8 175 262
gene_id_101600 sp|Q5AWD4|BGLM_EMENI 35.90 39 25 0 21 59 506 544 4.9 30.4 129 772
gene_id_102100 sp|P20374|COX1_APILI 38.89 36 22 0 3 38 353 388 0.54 32.0 92 521
gene_id_102600 sp|Q46127|SYW_CLOLO 79.12 91 19 0 1 91 1 91 5e-44 150 92 341
gene_id_103100 sp|Q9UJX6|ANC2_HUMAN 53.57 28 13 0 11 38 608 635 2.1 28.9 42 822
gene_id_103600 sp|C1DA02|SYL_LARHH 35.59 59 30 2 88 138 382 440 4.6 30.8 140 866
gene_id_104100 sp|B8DHP2|PROB_LISMH 25.88 85 50 2 37 110 27 109 0.81 32.3 127 276
gene_id_105100 sp|A1ALU1|RL3_PELPD 31.88 69 42 2 14 77 42 110 2.2 31.6 166 209
gene_id_105600 sp|P59696|T200_SALTY 64.00 125 45 0 5 129 3 127 9e-58 182 129 152
gene_id_10600 sp|G3XDA3|CTPH_PSEAE 28.38 74 48 1 4 77 364 432 0.56 31.6 81 568
gene_id_106100 sp|P94369|YXLA_BACSU 35.00 100 56 3 25 120 270 364 4e-08 53.9 120 457
gene_id_106600 sp|P34706|SDC3_CAEEL 60.00 20 8 0 18 37 1027 1046 2.3 32.7 191 2150
Now, I need to extract the gene ID, which is the one between || in the second column. In other words, I need an output that looks like this:
Q53IZ1
Q49W80
A7TSV7
P37348
B0U4U5
Q5AWD4
P20374
Q46127
Q9UJX6
C1DA02
B8DHP2
A1ALU1
P59696
G3XDA3
P94369
P34706
I have been trying to do it using the following command:
awk '{for(i=1;i<=NF;++i){ if($i==/[A-Z][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]/){print $i} } }'
but it doesn't seem to work.
Pattern matching is not really necessary. I'd suggest
awk -F\| '{print $2}' filename
This splits the line into |-delimited fields and prints the second of them.
Alternatively,
cut -d\| -f 2 filename
achieves the same.

how to print a really big datastructure in clojure?

When I try to print a really long array, it gets cut off at a certain length
[-1 -40 -1 -32 0 16 74 70 73 70 0 1 1 0 0 1 0 1 0 0 -1
-37 0 67 0 8 6 6 7 6 5 8 7 7 7 9 9 8 10 12 20 13 12 11
11 12 25 18 19 15 20 29 26 31 30 29 26 28 28 32 36 46 39
32 34 44 35 28 28 40 55 41 44 48 49 52 52 52 31 39 57 61
56 50 60 46 51 52 50 -1 -37 0 67 1 9 9 9 12 11 12 ...]
I would like it not to do that if I'm persisting a data structure to file. How can this be done?
The special variable *print-length* determines how much of a given structure is printed. Like any other dynamic var, you can use binding to set its value in a block.
user> (binding [*print-length* 2] (prn (range 200)))
(0 1 ...)
nil
user> (binding [*print-length* nil] (prn (range 200)))
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199)
nil

Regarding Standard Oxford Format for vlfeat sift

One of my upper classmates has given me a data set for experimenting with vlfeat's SIFT, however, her extracted SIFT data for the frame part contains 5 dimensions. An example is given below:
192
9494
262.08 749.211 0.00295391 -0.00030945 0.00583025 0 0 0 45 84 107 86 8 10 49 31 21 32 37 46 50 11 23 49 60 29 30 24 17 4 15 67 25 28 47 13 11 27 9 0 40 117 99 27 3 117 117 39 19 11 18 16 32 8 27 50 117 102 20 23 18 2 10 36 45 47 84 37 16 36 31 9 50 112 52 12 9 117 36 6 4 3 15 54 117 9 3 2 31 94 101 92 23 0 20 47 36 38 14 1 0 34 19 39 52 27 0 0 31 6 14 18 29 24 13 11 11 12 10 3 1 4 25 29 5 0 5 6 3 12 29 35 2 93 73 61 50 123 118 100 109 58 44 79 122 120 108 103 87 92 61 28 33 55 107 123 123 37 73 60 32 93 123 123 89 118 118 77 66 118 118 63 96 118 94 60 27 41 74 108 118 107 81 107 118 118 43 73 64 118 118 118 56 45 38 27 58
432.424 57.2287 0.00285143 -0.00048992 0.00292525 10 12 19 26 88 43 14 10 3 4 44 50 125 74 0 1 2 4 47 34 17 3 0 0 3 3 8 6 1 0 0 1 11 12 14 17 43 37 10 6 35 36 125 77 47 10 5 13 2 7 125 125 125 29 0 2 1 3 11 15 33 5 1 0 36 14 7 8 102 64 37 27 41 8 2 2 55 53 103 125 4 2 2 5 125 125 41 28 1 3 4 7 32 11 3 1 46 29 6 7 125 57 3 3 49 11 0 1 90 34 19 31 10 3 3 6 122 33 10 9 0 2 11 10 7 2 2 1 35 64 129 129 129 93 48 44 24 55 129 117 129 71 41 19 44 65 76 58 129 129 129 89 42 48 57 96 129 129 90 55 133 118 58 42 58 42 133 133 133 62 24 17 18 12 133 133 133 133 133 125 78 33 17 29 133 133 82 45 23 11 13 44
... // the list keeps on going for all keypoints.
This file is simply descriptors' data of an image. There are a few things I need to know:
what are the first two values '192' and '9494'?
what is the 5th value for the keypoint? vlfeat's sift normally gives out 4 values for key point's frame.
So I asked her what is this 5th dimension, and she pointed me to search for "standard oxford format" for sift feature.
The thing is I tried to search around regarding this standard oxford format and sift feature, but I got no luck in finding it at all. If somebody knows anything regarding this, could he please point me to the right direction?
192 represents the descriptor length ,9494 represent the Number of key-points you have in the file.
The other line consists of [WORD_ID] [X] [Y] [A] [B] [C]
X and Y is the feature centroid and A, B, C define the parameters of
the ellipse in the following equation A*(x-X)^2 + 2*B*(x-X)(y-Y) + C(y-Y)^2 = 1
You can check the official website for the formate Here
If you are using VLfeat package you can read here how to read the file in Oxford format.
If you are very curious how the file formate is read in VLfeat vl_ubcread function. Here is the code.