Trying to sort this list from lowest (399||1) to highest value (11064||2) while conserving the provided data format to be reused in an API loop request.
As you can see below, sorted() is not working as (I) expected. This is Python 2.7.
It looks like it sorts in pieces. Why would 1000-1100 come before 300-700, and then 8000? I cannot find this same issue posted anywhere.
sorted_d = sorted(d)
print sorted_d
Run:
[u'1053||1', u'1092||2', u'1093||1', u'1094||1', u'1094||2', u'1095||1',
u'1095||2', u'1096||7', u'1096||8', u'1097||7', u'1097||8', u'11064||1',
u'11064||2', u'399||1', u'412||1', u'412||2', u'413||1', u'414||1',
u'434||2', u'616||1', u'617||1', u'618||1', u'619||1', u'620||1', u'621||1',
u'622||1', u'727||1', u'8096||1', u'8097||1', u'8099||1', u'8101||1',
u'8105||1', u'8112||1', u'8113||1', u'8140||1', u'8141||1', u'8142||1',
u'8143||1', u'8144||1', u'8146||2', u'8150||1', u'8152||1', u'8153||1',
u'8154||1', u'8157||1', u'8158||1', u'8159||1', u'8160||1', u'8161||1',
u'8162||1', u'8163||1', u'8164||1', u'8165||1', u'8166||1', u'8167||1',
u'8168||1', u'8169||1', u'8170||1', u'8171||1', u'8172||1', u'8173||1',
u'8174||1', u'8175||1', u'8184||2', u'8184||3', u'8185||2', u'8185||3',
u'8186||5', u'8186||6', u'8187||1', u'8188||2', u'8190||2', u'8191||1']
Assistance greatly appreciated.
Thx
You could also split the strings on the || and specify the first part as the key parameter
sorted_d = sorted(d, key = lambda x: int(x.split('||')[0]))
print sorted_d
[u'399||1', u'412||1', u'412||2', u'413||1', u'414||1', u'434||2', u'616||1', u'617||1', u'618||1', u'619||1', u'620||1', u'621||1', u'622||1', u'727||1', u'1053||1', u'1092||2', u'1093||1', u'1094||1', u'1094||2', u'1095||1', u'1095||2', u'1096||7', u'1096||8', u'1097||7', u'1097||8', u'8096||1', u'8097||1', u'8099||1', u'8101||1', u'8105||1', u'8112||1', u'8113||1', u'8140||1', u'8141||1', u'8142||1', u'8143||1', u'8144||1', u'8146||2', u'8150||1', u'8152||1', u'8153||1', u'8154||1', u'8157||1', u'8158||1', u'8159||1', u'8160||1', u'8161||1', u'8162||1', u'8163||1', u'8164||1', u'8165||1', u'8166||1', u'8167||1', u'8168||1', u'8169||1', u'8170||1', u'8171||1', u'8172||1', u'8173||1', u'8174||1', u'8175||1', u'8184||2', u'8184||3', u'8185||2', u'8185||3', u'8186||5', u'8186||6', u'8187||1', u'8188||2', u'8190||2', u'8191||1', u'11064||1', u'11064||2']
Because it's treating your '1053||1' data as strings and sorting as a string type instead of as a numeric value. So it effectively is sorting in this type of manner, ascending:
1
10
100
1000
2
20
200
2000
I need help to figure out how to get parent index from child index, child-level and parent-level using Python.
I have dataset with three columns: index, child-level and parent-level.
The records are in order of hierarchy.
Index is just the line number of record.
Child-level is number indicating level in hierarchy of nested parent child records.
Parent-level = child-level - 1
My challenge is, for each record, I want to use Python to get each record's parent index.
I suspect a list comprehension might be used to get the max index value where the self join index < child.index and the self join level = child.level
This is a visual representation of the data set.
This is sample data and expected result. Goal is to get parent index.
Index, Child-Level,Parent-Level,Parent-Index
1,1,1,1
2,2,1,1
4,4,3,3
9,9,8,8
3,3,2,2
5,5,4,4
8,8,7,7
6,6,5,5
7,7,6,6
10,10,9,9
11,11,10,10
12,12,11,11
13,13,12,12
14,14,13,13
15,14,13,13
16,14,13,13
17,14,13,13
18,14,13,13
19,14,13,13
20,14,13,13
21,13,12,12
22,13,12,12
23,13,12,12
24,14,13,23
25,14,13,23
26,14,13,23
27,11,10,10
28,9,8,8
29,9,8,8
30,9,8,8
31,9,8,8
32,9,8,8
33,9,8,8
34,9,8,8
35,8,7,7
36,9,8,35
37,10,9,36
38,11,10,37
39,11,10,37
40,12,11,39
41,12,11,39
42,13,12,41
43,13,12,41
44,13,12,41
45,11,10,37
46,12,11,45
47,13,12,46
48,14,13,47
49,14,13,47
50,14,13,47
51,14,13,47
52,14,13,47
53,14,13,47
54,14,13,47
55,13,12,46
56,13,12,46
57,13,12,46
58,9,8,35
59,9,8,35
60,9,8,35
61,9,8,35
62,8,7,7
63,8,7,7
64,8,7,7
65,8,7,7
66,8,7,7
67,8,7,7
68,8,7,7
Sorry if the title is a bit convoluted, I found this particular question hard to phrase. Basically, I have two lists, and I'm attempting to modify the first one with information with the second, using Notepad++.
LIST 1 (Not the entire list):
2000,4031161,1,1,1008,1000000
2000,4031162,1,1,1008,1000000
100100,4000019,1,1,0,600000
100100,2000000,1,1,0,20000
100100,2040002,1,1,0,300
100100,2041001,1,1,0,300
100100,2060000,1,1,0,30000
100100,4010000,1,1,0,9000
100100,4020000,1,1,0,9000
100100,2061000,1,1,0,30000
100100,1002067,1,1,0,1500
100100,2010009,1,1,0,20000
100100,2380000,1,1,0,1000
100100,0,4,6,0,400000
100101,4000000,1,1,0,600000
100101,2041006,1,1,0,300
100101,2000000,1,1,0,20000
100101,4020001,1,1,0,9000
100101,2060000,1,1,0,30000
100101,4010001,1,1,0,9000
100101,2061000,1,1,0,30000
100101,1040013,1,1,0,800
100101,1041012,1,1,0,800
100101,1060004,1,1,0,800
100101,1040017,1,1,0,800
100101,1060013,1,1,0,800
100101,2010009,1,1,0,20000
100101,2380001,1,1,0,1000
100101,0,8,12,0,400000
100120,0,1,5,0,400000
100121,0,10,14,0,400000
100121,4000483,1,1,0,400000
100130,4000493,1,1,0,600000
100130,2010000,1,1,0,20000
100130,2010009,1,1,0,20000
100130,4010005,1,1,0,9000
100130,4020005,1,1,0,9000
100130,2040003,1,1,0,300
100130,1002008,1,1,0,1500
100130,1040010,1,1,0,800
100130,1041004,1,1,0,800
100130,1060007,1,1,0,800
100130,2380015,1,1,0,1000
100131,4000494,1,1,0,600000
100131,2000000,1,1,0,20000
100131,2010009,1,1,0,20000
100131,4010006,1,1,0,9000
100131,4020006,1,1,0,9000
100131,2040400,1,1,0,300
100131,2040618,1,1,0,300
100131,1002019,1,1,0,1500
100131,1002002,1,1,0,1500
100131,1040013,1,1,0,800
100131,1041012,1,1,0,800
100131,1060004,1,1,0,800
100131,1072005,1,1,0,800
100131,2380016,1,1,0,1000
100132,4000495,1,1,0,600000
100132,2000000,1,1,0,20000
100132,2010009,1,1,0,20000
100132,4010000,1,1,0,9000
100132,4020007,1,1,0,9000
100132,2040823,1,1,0,300
100132,2041018,1,1,0,300
100132,1002001,1,1,0,1500
100132,1002003,1,1,0,1500
100132,1040014,1,1,0,800
100132,1040015,1,1,0,800
100132,1060008,1,1,0,800
100132,1041014,1,1,0,800
100132,1061014,1,1,0,800
100132,1072004,1,1,0,800
100132,1082003,1,1,0,1000
100132,1442000,1,1,0,700
100132,2380017,1,1,0,1000
100133,4000496,1,1,0,600000
100133,2000000,1,1,0,20000
100133,2010009,1,1,0,20000
100133,4010001,1,1,0,9000
100133,4020003,1,1,0,9000
100133,2048000,1,1,0,300
100133,2041004,1,1,0,300
100133,1002041,1,1,0,1500
100133,1002007,1,1,0,1500
100133,1032001,1,1,0,1000
100133,1040038,1,1,0,800
100133,1060028,1,1,0,800
100133,1041064,1,1,0,800
LIST 2 (Not the entire list):
2000,4031161
2000,4031162
100130,2040003
100131,2040400
100133,2048000
100134,2040500
100134,2044400
130101,4031846
210100,4031273
851000,2290132
1110100,4020002
1110100,4031146
1110130,4000012
1110130,2043102
1110130,1092008
1110130,2048000
1110130,1002033
1110130,1302007
1110130,1032001
1110130,1412012
1110130,4032316
1130100,4031147
1140130,2048001
1140130,1412012
1140130,2044802
1210100,4031846
1210100,4032340
1210102,4032314
2100100,4020006
2100100,4010001
2100100,4010007
2100100,2040420
2100100,2049000
2100101,4010006
2100101,4020001
2100101,4010007
2100101,2044210
2100102,2043212
2100103,2044314
2100104,1452022
2100105,2040316
2100105,2040319
2100105,2044412
2100106,2040926
2100107,1382009
2100108,4010002
2100108,4010001
2100108,4010007
2100108,2044014
2100108,2044214
2110200,2043214
2110200,1452016
2110200,4032390
2110300,2043214
2110301,4010002
2110301,2043114
2130100,2044012
2130100,2044210
2130103,2040617
2220000,4010000
2220000,4020000
2220100,4020006
2230100,4020007
2230100,2040823
2230100,2044010
2230101,4010003
2230102,4031155
2230102,4007001
2230102,1462014
2230103,2040319
2230103,2044114
2230103,1382009
2230104,2040929
2230104,2043112
2230104,1452016
2230105,2040617
2230105,2043015
2230105,4031259
2230106,2040417
2230106,4031268
2230106,4031260
2230106,4031269
2230107,1092030
2230108,2040623
2230108,4031261
2230109,4010004
2230109,4031264
2230110,2044312
2230110,2044805
2230110,1472030
2230111,2049000
2230131,4000008
2230131,1050031
2230200,4031262
2300100,2043112
3000000,2040316
3000000,2040620
3000001,4000068
3000001,2000001
3000001,2000003
3000001,4020004
3000001,4010002
3000001,2050000
3000001,2050001
3000001,2050002
3000001,2050003
3000001,2050004
3100101,4010005
3100101,4020000
3100101,4010007
3100101,4130005
3100101,4130009
3100101,1332025
3110100,4130002
3110100,4130008
3110100,4130010
3110100,4007005
3110101,2044012
3110101,4130002
3110102,4131002
3110102,2044210
3110102,4130003
3110102,4130004
3110102,4130011
3110102,4031129
3110102,4007007
3110102,4007001
3110102,1302030
3110300,2040530
3110300,2044410
3110300,4130002
3110300,4130009
3110300,4130013
3110300,4007000
3110300,4007004
3110301,4010005
3110301,4020000
3110301,4010007
3110301,2040420
3110301,4130001
3110301,4130006
3110302,2040324
3110302,2044210
3110302,4130010
3110302,4130015
3110302,4031694
3110302,4007003
3110303,2040417
3110303,2044112
3110303,2044310
3110303,2044809
3110303,4130001
3110303,4130002
3110303,4130016
3110303,4031694
3110303,1472030
3210100,4010002
3210100,4130011
3210100,4130016
3210100,4130017
3210100,4007003
3210100,4007001
3210100,1382009
3210200,4130007
3210200,4130016
3210200,4007000
3210200,4007006
3210200,4007001
3210201,2043114
3210201,4130003
3210201,4130004
3210201,4130012
3210202,2043110
3210202,2044807
3210202,4130006
3210202,4130012
3210203,2040923
3210203,2043212
3210204,2040617
3210204,4130015
3210204,4130017
3210205,4130001
3210205,4130004
3210205,4130014
3210205,4031093
3210205,4007007
3210205,4007005
3210205,1412011
3210206,4130015
3210206,4130016
3210207,2049000
3210207,4130007
3210207,4130008
3210208,4130006
3210208,4130008
3210208,2382028
3210208,4031279
3210208,4007002
3210208,4007004
3210208,1452022
3210450,4130000
3210450,4130014
3210450,4130017
3210450,4007004
3210800,4020004
3210800,2044414
3210800,4130008
3210800,4130010
3210800,1452022
3220000,1322027
3220000,2044112
3220000,2044412
3230100,4130006
3230100,4130012
3230100,4130017
3230100,4031239
3230101,4130007
3230101,4130014
3230101,4007000
3230101,4007003
3230102,2040024
3230102,2040423
3230102,4130011
3230102,4130015
3230103,2044112
3230103,4130001
3230103,4130011
3230104,2044212
3230104,4130000
3230104,4130003
3230104,4130005
3230104,4031263
3230200,2044807
3230200,4130009
3230200,4130014
3230200,4031309
3230200,4007000
3230200,1432012
3230300,4000067
3230300,2000002
3230300,2000003
3230300,4020000
3230300,4010001
3230300,4004000
3230300,4004001
3230300,4004002
3230300,4004003
3230302,4130005
3230302,4130012
3230302,4130013
3230302,4031089
3230302,1422014
3230303,2044312
3230303,4130009
3230303,4130010
3230303,4130012
3230304,4010001
3230304,2040316
3230304,2049000
3230304,4130002
3230304,4130017
3230305,2040926
3230305,4130003
3230305,4130004
3230305,4130014
3230306,4130000
3230306,4130010
3230306,4007000
3230306,4007005
3230306,1472032
3230307,4010001
3230307,2040929
3230307,2044110
3230307,4130010
3230307,4130013
3230308,2043210
3230308,4130004
3230308,4130006
3230308,4130015
3230400,4130001
3230400,4130008
3230400,4031140
3230400,4031135
3230400,4007002
3230400,4007004
3230405,4131005
3230405,2044410
3230405,4130009
3230405,4130013
3300001,4130005
3300001,4130009
3300005,2043801
3300005,2044801
3300006,2040602
3300006,1041076
3300006,1072126
3300007,2040001
3300007,2040301
3300007,2043701
3300007,2043801
3300007,2040601
3300007,1041033
3300007,2040302
3300007,2044801
3300008,2040301
3300008,2043801
3300008,2044802
4110300,4130002
4110300,4130013
4110300,2382057
4110300,4007004
4110301,4130007
4110301,4130012
4110301,2382072
4110301,4007000
4110301,4007005
4110301,4007006
4110302,2000002
4110302,2000003
4110302,4020000
4110302,4020006
4110302,4130012
4110302,2044102
4110302,1372007
4110302,4006001
4110302,1040089
4110302,1050045
4110302,4004002
4110302,2040001
4110302,4000359
4110302,1082198
4110302,4007006
4110302,4007001
4130100,2040025
4130100,2040621
4130100,2044014
So, basically, I need to remove every line in LIST 1 that doesn't start with one of the lines from LIST 2. So, for example, the first line of LIST 2 is "2000,4031161", so I DO NOT want to remove "2000,4031161,1,1,1008,1000000". One line in LIST 1 is "100100,4000019,1,1,0,600000", and since there is no line in LIST 2 that says "100100,4000019", I want that line removed. The real list is a couple tens of thousands of lines long. I made a really REALLY long regex command that should have sorted it all for me using search and replace, but then I found out there was a 2048 character limit, and I'm interested in finding a better way to do this anyway.
I've no clue using notepad++ but using gawk for windows you can do this:
gawk -F"," 'NR==FNR{l[$0]++; next} {if ($1","$2 in l) print $0 }' file2 file1
The first block make a list l of entries in file2, the first file given in arguments (Number Record == File Number Record) and skip to the next record.
Once file2 has been processed, the second block is executed for each line, as we use , as field separator we search the 2 first fieds as keys in l and print the line only if they're present in the list.
List in awk are c hashes under the hood, so the RAM should not be a problem even for very large number of lines in file2.