Replace a list of tuples based on condition - list

My program has two lists
List 1 [(x-coordinate,y-coordinate,List1_id)] (list of tuples)
List 2 [(x-coordinate,y-coordinate,List2_id)] (list of tuples)
So, now I have a code that runs to compute the distance between all the points in List 1 and List 2. So, my resulting list is
ResultList = [(List1_id, List2_id, distance)]
Since the dataset is too big, my ResultList is about 4gb long. I don't need all the distances. I need only minimum distance of a given combination of list ids
So, my resulting list should be
[(List1_id1, List2_id1, min_distance),(List1_id1, List1_id2, min_distance),(List1_id2, List2_id1, min_distance) and so on..]
from
[(List1_id1, List2_id1, distance),(List1_id1, List1_id2, distance),(List1_id2, List2_id1, distance), (List1_id1, List2_id1, distance), (List1_id1, List2_id1, distance), (List1_id1, List1_id2, distance) and so on..]
I hope this makes sense :|
So, currently, I calculate the distance and append to a new list. I need to apply a logic that when I append to the list, the new element must be added only if the distance is minimum for any combination of list IDs. This will make sure that my list is not GBs long
Edit 2 - Some data
List 1 - [(99363.044,441277.027,8),(99373.343,441272.8,8),(99354.918,441288.428,8),(99362.324,441291.766,8),(99360.426,441264.083,8),(99369.039,441287.165,5),(99364.127,441288.681,5)]
List 2 - [(99360.05,441264.68,42,0),(99360.05,441264.69,42,0),(99360.05,441264.7,42,0),(99360.15,441264.58,42,0),(99360.15,441264.62,42,0),(99361.26,441279.93,53,0),(99361.26,441280.14,53,0),(99361.26,441279.9,53,0),(99361.26,441279.81,53,0),(99354.55,441271.69,63,0),(99354.55,441275.66,63,0),(99354.55,441271.66,63,0),(99354.55,441275.64,63,0),(99354.55,441275.48,63,0),(99354.55,441270.59,63,0),(99354.55,441275.44,63,0),(99354.55,441271.06,63,0),(99354.55,441272.84,63,0),(99355.32,441273.42,63,0),(99355.32,441275.26,63,0),(99355.32,441274.48,63,0),(99355.32,441274.95,63,0),(99358.02,441284.35,68,0),(99358.02,441284.36,68,0),(99358.02,441284.4,68,0),(99358.02,441284.49,68,0),(99358.02,441283.97,68,0),(99358.16,441284.34,68,0),(99358.16,441284.36,68,0),(99358.16,441284,68,0)]
Obtained result
[(42, 5, 24.334954489354597), (42, 5, 24.387460917433867), (63, 5, 16.052616297632952), (42, 8, 12.74005043161773), (42, 8, 24.286417356220991), (42, 8, 0.56849362356923416), (42, 8, 24.276643260552106), (63, 8, 9.5884464330723027), (53, 8, 16.078644376965897), (42, 5, 24.196659810778854), (68, 5, 7.3672335377565004), (63, 8, 8.6337271789140431), (68, 8, 8.5137754257193521), (63, 5, 17.619914017943259), (42, 5, 24.27131117180317), (42, 5, 24.426926331373604), (63, 5, 22.01503454456742), (63, 8, 12.79329386827386), (68, 8, 8.8889518504529388), (53, 8, 14.029817853416182), (63, 8, 18.977524838601038), (53, 8, 10.700041495254004), (63, 8, 10.641936149028002), (68, 8, 20.383346266049983), (63, 8, 12.965015426166449), (42, 8, 12.695101614399849), (68, 5, 11.239807204741309), (68, 8, 8.5700539671281586), (63, 8, 10.047523326678597), (53, 8, 11.913607849829605), (53, 8, 3.5879555460002752), (42, 8, 27.161359170695093), (63, 8, 18.21630785860641), (68, 8, 20.40322241706173), (68, 5, 11.339046961713443), (53, 8, 3.4073545456772449), (63, 8, 21.556583959454493), (42, 8, 27.232913762585632), (63, 5, 18.684758119918691), (53, 8, 15.86893080835191), (63, 8, 17.930534069001059), (42, 5, 24.234094701462343), (42, 5, 24.344813205268306), (42, 8, 0.72254065632100728), (42, 8, 0.71402030785190995), (63, 8, 18.08240448612629), (63, 8, 17.902037649371213), (68, 8, 5.4310374699841022), (68, 8, 5.0840129818797895), (53, 8, 13.969215761813532), (63, 8, 8.6032973329956892), (68, 8, 8.8105814223397942), (68, 8, 8.4536732844420808), (63, 8, 18.827544954127116), (53, 8, 14.014595570349684), (68, 8, 20.45896686056269), (63, 5, 16.710225911093556), (63, 8, 18.0336609982418), (68, 8, 8.9051688361531518), (68, 8, 19.194476523198581), (63, 8, 9.1217270842880183), (42, 8, 15.571639252163617), (63, 5, 18.613713922803964), (63, 8, 19.637508294091823), (63, 5, 18.513690771945917), (68, 5, 11.360615564297584), (53, 8, 10.62768780119729), (42, 5, 24.215229629293152), (68, 8, 8.8807040824213157), (68, 8, 5.1237181811943104), (53, 5, 10.623467701270917), (63, 8, 18.922498487236851), (42, 8, 27.181289005499924), (53, 5, 9.3227855279266194), (68, 8, 8.8119039940279258), (68, 5, 7.7129093084233054), (68, 8, 20.045489392894488), (53, 8, 10.436086814504577), (63, 8, 15.592343249160891), (53, 5, 9.2086747146265946), (68, 8, 20.419244966454769), (63, 8, 10.657532782006815), (53, 8, 3.3818316043413272), (42, 8, 12.779007981828714), (53, 8, 11.883728034595848), (53, 8, 11.67458658795881), (68, 8, 8.5744662807839234), (63, 5, 20.055387555449311), (63, 8, 10.031530541240986), (68, 8, 5.1157626997569583), (63, 8, 22.557882258733933), (63, 5, 20.374904809561844), (68, 5, 11.234797105421109), (63, 5, 16.309056686388619), (68, 8, 19.082900434672979), (42, 8, 27.171324075208528), (63, 5, 18.50124714715599), (53, 5, 9.237188425017159), (63, 8, 12.95322847790688), (63, 5, 16.31268248936852), (63, 5, 18.368919020974811), (63, 8, 17.371898226750695), (68, 8, 19.0707915147892), (42, 8, 24.296191635748073), (63, 8, 18.983130642747213), (53, 5, 10.481577457599492), (68, 5, 11.472854309196546), (42, 8, 15.523132705735247), (68, 8, 8.9964962624199476), (63, 5, 16.341450669976716), (68, 5, 7.4580433090399447), (68, 8, 19.273049291685261), (68, 8, 20.409314662665938), (63, 8, 9.4698999466550209), (63, 8, 15.01338296324063), (63, 8, 8.6064978359273443), (63, 8, 12.982846567676086), (68, 8, 8.9219787603546088), (63, 5, 18.164250218470507), (42, 8, 12.685383912192332), (63, 5, 20.46956789964668), (68, 8, 20.548345067189207), (68, 5, 7.4067489494241725), (63, 8, 8.5247067398202496), (63, 5, 19.504179295708195), (42, 8, 15.544248100166255), (68, 8, 5.4879639211748898), (53, 8, 12.003250892992714), (68, 8, 8.5132946031419188), (68, 8, 8.5658188166784157), (68, 8, 5.2018446728271828), (42, 8, 15.576849777784778), (42, 8, 24.376108959405205), (42, 8, 0.60377562059008572), (63, 8, 22.117267281472209), (63, 8, 11.583127600088826), (68, 8, 8.4963363869476058), (53, 8, 15.749097910686972), (68, 5, 11.370417142742211), (63, 8, 8.640984029603004), (68, 8, 5.2175001676950679), (63, 5, 21.221124993745853), (63, 8, 7.9235386665163343), (42, 8, 27.272786289620047), (63, 8, 13.953791886111953), (63, 8, 18.19011074731587), (63, 8, 17.884023932012049), (63, 8, 20.46041426753143), (63, 8, 18.101130600034072), (68, 5, 7.3789816370214911), (63, 8, 10.545730178644387), (63, 8, 18.793042568984102), (68, 5, 11.372888199574382), (63, 8, 12.288065958504605), (63, 5, 21.663382607511704), (63, 8, 21.52860543555683), (68, 5, 11.330042630100078), (63, 8, 10.380420270870811), (68, 8, 19.188455617881456), (63, 8, 17.841795537434088), (42, 8, 12.704819754729604), (63, 8, 12.773302157274143), (63, 8, 16.742044916927171), (63, 8, 13.174134810306763), (63, 5, 19.530319249824927), (63, 8, 12.787056932713549), (68, 8, 20.032014501780179), (53, 8, 10.603629944515824), (53, 5, 10.705553045026424), (42, 8, 24.41517822994124), (53, 8, 3.3057139924644763), (63, 8, 8.7674640005297135), (42, 5, 24.325095888783526), (63, 8, 18.825752282435211), (68, 8, 18.866994699738399), (63, 8, 18.873379374124106), (63, 5, 18.638850447371869), (42, 5, 24.205944435179109), (42, 8, 15.566433406511978), (68, 5, 7.5839864187451367), (63, 8, 19.006379165948232), (68, 8, 5.0130078795250803), (63, 8, 18.651053911255637), (63, 8, 8.1331042658944277), (68, 8, 8.793942517452491), (63, 8, 16.77203768184846), (53, 5, 10.643921551735795), (42, 8, 0.70553880121043167), (63, 8, 18.046298013735743), (53, 5, 9.0093490330572585), (63, 8, 9.6121706705780792), (68, 5, 7.4810754574348568), (63, 5, 19.419989340877574), (68, 8, 18.96215254129423), (63, 8, 12.993212381867547), (63, 8, 12.82259665590326), (63, 5, 16.163705330153071), (63, 5, 16.179821074381103), (68, 5, 7.4868558153549847), (63, 8, 7.9983813987434456), (68, 8, 8.5312585237825154), (53, 8, 14.13769744337263), (63, 8, 12.006786622597852), (63, 5, 18.510975392951089), (68, 8, 19.218593314821081), (53, 8, 15.838972346753351), (63, 8, 18.150785905839268), (63, 5, 21.19921569302317), (63, 8, 19.009377922477256), (63, 8, 13.483993770395951)]
Expected result
[(42, 5, 24.334954489354597),(63, 5, 16.052616297632952), (42, 8, 12.74005043161773),(63, 8, 9.5884464330723027), (53, 8, 16.078644376965897),(68, 8, 8.5137754257193521),(53, 5, 10.623467701270917),(68, 5, 7.3672335377565004)]
Of course in the expected result, the values don't reflect the true minimum but they should be from the obtained result. I hope you get what I mean

obtained_result = [(42, 5, 24.334954489354597),
(42, 5, 24.387460917433867),
(63, 5, 16.052616297632952),
#...
]
from itertools import groupby
expected_result = [(k[0], k[1], min([x[2] for x in v]))
for k, v in groupby(sorted(obtained_result), lambda x: (x[0], x[1]))]
The list comprehension first sort obtained_result so it can then be grouped by groupby using (x[0], x[1]) as the key. We then find the minimum of the elements in the same group, and assign it to create new list of tuples.
This gave us:
>>> expected_result
[(42, 5, 24.196659810778854),
(42, 8, 0.5684936235692342),
(53, 5, 9.009349033057259),
(53, 8, 3.3057139924644763),
(63, 5, 16.05261629763295),
(63, 8, 7.923538666516334),
(68, 5, 7.3672335377565),
(68, 8, 5.01300787952508)]
Is it what you want? Anyhow, I think dict would be a better choice for your data.

Finally! Got the code running to give me what I want!
expected_list = []
for b1, b2 in enumerate(list1):
for l1, l2 in enumerate(list2):
replace = None
bp1 = list1[b1][0]
bp2 = list1[b1][1]
lp1 = list2[l1][0]
lp2 = list2[l1][1]
d = numpy.sqrt((bp1 - lp1) * (bp1 - lp1) + (bp2 - lp2) * (bp2 - lp2))
if(len(expected_list) == 0 ):
expected_list.append((int(list2[l1][2]), int(list1[b1][2]), d))
else:
for d1, d2 in enumerate(expected_list):
if (expected_list[d1][0] == int(list2[l1][2]) and expected_list[d1][1] == int(list1[b1][2])):
min_distance = min(expected_list[d1][2],d)
replace = d1
if (replace == None):
expected_list.append((int(list2[l1][2]), int(list1[b1][2]), d))
else:
expected_list[replace] = (int(list2[l1][2]), int(list1[b1][2]), min_distance)
print(expected_list)
If anybody can help me optimize the code, I will be grateful :)

Related

Finding combinations where the length of the combination is more then the variables being used

I'm trying to get 5 length combinations out of a list of 2 can't find anything that works.
x = [5,7]
abc = list(itertools.combinations((x),5))
All I get is []
Hoping to get every possible combinations of [5,7] but with a length of 5 like [5,7,7,5,7].
Is seems possible, I've tried alot of different things.
once again, thanks for all the help.
The reason why you get [] is indeed (as the title suggests) you want a length longer than the number of elements. Whereas, the doc says:
itertools.combinations(iterable, r):
Return r length subsequences of elements from the input iterable.
I guess what you need is another function (next paragraph in the doc):
>>> x = [5, 7]
list(itertools.combinations_with_replacement((x),5))
[(5, 5, 5, 5, 5), (5, 5, 5, 5, 7), (5, 5, 5, 7, 7), (5, 5, 7, 7, 7), (5, 7, 7, 7, 7), (7, 7, 7, 7, 7)]
>>>
Or, as your example suggests, maybe you do not want combinations but permutations? Problem is, it doesn't seem to be possible to do the same as for combinations. But maybe a cartesian product will do the trick?
>>> list(itertools.product(x, repeat=5))
[(5, 5, 5, 5, 5), (5, 5, 5, 5, 7), (5, 5, 5, 7, 5), (5, 5, 5, 7, 7), (5, 5, 7, 5, 5), (5, 5, 7, 5, 7), (5, 5, 7, 7, 5), (5, 5, 7, 7, 7), (5, 7, 5, 5, 5), (5, 7, 5, 5, 7), (5, 7, 5, 7, 5), (5, 7, 5, 7, 7), (5, 7, 7, 5, 5), (5, 7, 7, 5, 7), (5, 7, 7, 7, 5), (5, 7, 7, 7, 7), (7, 5, 5, 5, 5), (7, 5, 5, 5, 7), (7, 5, 5, 7, 5), (7, 5, 5, 7, 7), (7, 5, 7, 5, 5), (7, 5, 7, 5, 7), (7, 5, 7, 7, 5), (7, 5, 7, 7, 7), (7, 7, 5, 5, 5), (7, 7, 5, 5, 7), (7, 7, 5, 7, 5), (7, 7, 5, 7, 7), (7, 7, 7, 5, 5), (7, 7, 7, 5, 7), (7, 7, 7, 7, 5), (7, 7, 7, 7, 7)]
EDIT: isn't your question really really close to this one: python all possible combinations of 0,1 of length k

Find and print equals in a list

Given a list of 50 random integers in the range [n,k], where n is less than k. I would like to find
how many numbers are equal to each other and print them.
This can be done with Tally as follows.
First, let's generate a test list:
list = RandomInteger[{5, 10}, 50]
(* ==> {10, 7, 5, 7, 10, 8, 6, 6, 7, 6, 6, 8, 7, 5, 6, 9, 10, 6,
9, 6, 10, 8, 10, 8, 9, 7, 5, 9, 8, 5, 9, 7, 5, 7, 9, 10,
6, 6, 7, 7, 5, 6, 9, 10, 5, 6, 6, 6, 10, 9}
*)
Then count them:
Tally[list]
(* ==> {{10, 8}, {7, 9}, {5, 7}, {8, 5}, {6, 13}, {9, 8}} *)

Weka data load error

I want to load the data in breast-cancer-wisconsin through Weka Explorer as a C4.5 data file and I'm getting the following errors when choosing both to load C4.5 .data and C4.5 .names:
Any ideas?
It does not look like the C45 names file is correct. Try replacing breast-cancer-wisconsin.names with this one:
2, 4.
clump: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
size: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
shape: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
adhesion: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
epithelial: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
nuclei: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
chromatin: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
nucleoli: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
mitoses: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Note that class comes first (only labels).
Here I have removed the first column of subjects' id in the original dataset using
$ cut -d, -f2-11 breast-cancer-wisconsin.data > breast-cancer-wisconsin.data
but it is not difficult to adapt the above code.
Alternative solutions:
Generate a csv file: you just need to add a header to the *.data file and rename it as *.csv. E.g., replace breast-cancer-wisconsin.data with breast-cancer-wisconsin.csv which should look like
clump,size,shape,adhesion,epithelial,nuclei,chromatin,nucleoli,mitoses,class
5,1,1,1,2,1,3,1,1,2
5,4,4,5,7,10,3,2,1,2
3,1,1,1,2,2,3,1,1,2
6,8,8,1,3,4,3,7,1,2
...
Construct directly an *.arff file by hand; that's not really complicated as there are few variables. An example file can be found here.

Optimally picking one element from each list

I came across an old problem that you Mathematica/StackOverflow folks will probably like and that seems valuable to have on StackOverflow for posterity.
Suppose you have a list of lists and you want to pick one element from each and put them in a new list so that the number of elements that are identical to their next neighbor is maximized.
In other words, for the resulting list l, minimize Length#Split[l].
In yet other words, we want the list with the fewest interruptions of identical contiguous elements.
For example:
pick[{ {1,2,3}, {2,3}, {1}, {1,3,4}, {4,1} }]
--> { 2, 2, 1, 1, 1 }
(Or {3,3,1,1,1} is equally good.)
Here's a preposterously brute force solution:
pick[x_] := argMax[-Length#Split[#]&, Tuples[x]]
where argMax is as described here:
posmax: like argmax but gives the position(s) of the element x for which f[x] is maximal
Can you come up with something better?
The legendary Carl Woll nailed this for me and I'll reveal his solution in a week.
Not an answer, but a comparison of the methods proposed here. I generated test sets with a variable number of subsets this number varying from 5 to 100. Each test set was generated with this code
Table[RandomSample[Range[10], RandomInteger[{1, 7}]], {rl}]
with rl the number of subsets involved.
For every test set that was generated this way I had all the algorithms do their thing. I did this 10 times (with the same test set) with the algorithms operating in a random order so as to level out order effects and the effects of random background processes on my laptop. This results in mean timing for the given data set. The above line was used 20 times for each rl length, from which a mean (of means) and a standard deviation were calculated.
The results are below (horizontally the number of subsets and vertically the mean AbsoluteTiming):
It seems that Mr.Wizard is the (not so clear) winner. Congrats!
Update
As requested by Timo here the timings as a function of the number of distinct subset elements that can be chosen from as well as the maximum number of elements in each subset. The data sets are generated for a fixed number of subsets (50) according to this line of code:
lst = Table[RandomSample[Range[ch], RandomInteger[{1, ch}]], {50}];
I also increased the number of datasets I tried for each value from 20 to 40.
Here for 5 subsets:
I'll toss this into the ring. I am not certain it always gives an optimal solution, but it appears to work on the same logic as some other answers given, and it is fast.
f#{} := (Sow[m]; m = {i, 1})
f#x_ := m = {x, m[[2]] + 1}
findruns[lst_] :=
Reap[m = {{}, 0}; f[m[[1]] ⋂ i] ~Do~ {i, lst}; Sow#m][[2, 1, 2 ;;]]
findruns gives run-length-encoded output, including parallel answers. If output as strictly specified is required, use:
Flatten[First[#]~ConstantArray~#2 & ### #] &
Here is a variation using Fold. It is faster on some set shapes, but a little slower on others.
f2[{}, m_, i_] := (Sow[m]; {i, 1})
f2[x_, m_, _] := {x, m[[2]] + 1}
findruns2[lst_] :=
Reap[Sow#Fold[f2[#[[1]] ⋂ #2, ##] &, {{}, 0}, lst]][[2, 1, 2 ;;]]
This is my take on it, and does pretty much the same thing as Sjoerd, just in a less amount of code.
LongestRuns[list_List] :=
Block[{gr, f = Intersection},
ReplaceRepeated[
list, {a___gr, Longest[e__List] /; f[e] =!= {}, b___} :> {a,
gr[e], b}] /.
gr[e__] :> ConstantArray[First[f[e]], Length[{e}]]]
Some gallery:
In[497]:= LongestRuns[{{1, 2, 3}, {2, 3}, {1}, {1, 3, 4}, {4, 1}}]
Out[497]= {{2, 2}, {1, 1, 1}}
In[498]:= LongestRuns[{{3, 10, 6}, {8, 2, 10, 5, 9, 3, 6}, {3, 7, 10,
2, 8, 5, 9}, {6, 9, 1, 8, 3, 10}, {1}, {2, 9, 4}, {9, 5, 2, 6, 8,
7}, {6, 9, 4, 5}}]
Out[498]= {{3, 3, 3, 3}, {1}, {9, 9, 9}}
In[499]:= pickPath[{{3, 10, 6}, {8, 2, 10, 5, 9, 3, 6}, {3, 7, 10, 2,
8, 5, 9}, {6, 9, 1, 8, 3, 10}, {1}, {2, 9, 4}, {9, 5, 2, 6, 8,
7}, {6, 9, 4, 5}}]
Out[499]= {{10, 10, 10, 10}, {{1}, {9, 9, 9}}}
In[500]:= LongestRuns[{{2, 8}, {4, 2}, {3}, {9, 4, 6, 8, 2}, {5}, {8,
10, 6, 2, 3}, {9, 4, 6, 3, 10, 1}, {9}}]
Out[500]= {{2, 2}, {3}, {2}, {5}, {3, 3}, {9}}
In[501]:= LongestRuns[{{4, 6, 18, 15}, {1, 20, 16, 7, 14, 2, 9}, {12,
3, 15}, {17, 6, 13, 10, 3, 19}, {1, 15, 2, 19}, {5, 17, 3, 6,
14}, {5, 17, 9}, {15, 9, 19, 13, 8, 20}, {18, 13, 5}, {11, 5, 1,
12, 2}, {10, 4, 7}, {1, 2, 14, 9, 12, 3}, {9, 5, 19, 8}, {14, 1, 3,
4, 9}, {11, 13, 5, 1}, {16, 3, 7, 12, 14, 9}, {7, 4, 17, 18,
6}, {17, 19, 9}, {7, 15, 3, 12}, {19, 12, 5, 14, 8}, {1, 10, 12,
8}, {18, 16, 14, 19}, {2, 7, 10}, {19, 2, 5, 3}, {16, 17, 3}, {16,
2, 6, 20, 1, 3}, {12, 18, 11, 19, 17}, {12, 16, 9, 20, 4}, {19, 20,
10, 12, 9, 11}, {10, 12, 6, 19, 17, 5}}]
Out[501]= {{4}, {1}, {3, 3}, {1}, {5, 5}, {13, 13}, {1}, {4}, {9, 9,
9}, {1}, {7, 7}, {9}, {12, 12, 12}, {14}, {2, 2}, {3, 3}, {12, 12,
12, 12}}
EDIT given that Sjoerd's Dreeves's brute force approach fails on large samples due to inability to generate all Tuples at once, here is another brute force approach:
bfBestPick[e_List] := Block[{splits, gr, f = Intersection},
splits[{}] = {{}};
splits[list_List] :=
ReplaceList[
list, {a___gr, el__List /; f[el] =!= {},
b___} :> (Join[{a, gr[el]}, #] & /# splits[{b}])];
Module[{sp =
Cases[splits[
e] //. {seq__gr,
re__List} :> (Join[{seq}, #] & /# {re}), {__gr}, Infinity]},
sp[[First#Ordering[Length /# sp, 1]]] /.
gr[args__] :> ConstantArray[First[f[args]], Length[{args}]]]]
This brute-force-best-pick might generate different splitting, but it is length that matters according to the original question.
test = {{4, 6, 18, 15}, {1, 20, 16, 7, 14, 2, 9}, {12, 3, 15}, {17, 6,
13, 10, 3, 19}, {1, 15, 2, 19}, {5, 17, 3, 6, 14}, {5, 17,
9}, {15, 9, 19, 13, 8, 20}, {18, 13, 5}, {11, 5, 1, 12, 2}, {10,
4, 7}, {1, 2, 14, 9, 12, 3}, {9, 5, 19, 8}, {14, 1, 3, 4, 9}, {11,
13, 5, 1}, {16, 3, 7, 12, 14, 9}, {7, 4, 17, 18, 6}, {17, 19,
9}, {7, 15, 3, 12}, {19, 12, 5, 14, 8}, {1, 10, 12, 8}, {18, 16,
14, 19}, {2, 7, 10}, {19, 2, 5, 3}, {16, 17, 3}, {16, 2, 6, 20, 1,
3}, {12, 18, 11, 19, 17}, {12, 16, 9, 20, 4}, {19, 20, 10, 12, 9,
11}, {10, 12, 6, 19, 17, 5}};
pick fails on this example.
In[637]:= Length[bfBestPick[test]] // Timing
Out[637]= {58.407, 17}
In[638]:= Length[LongestRuns[test]] // Timing
Out[638]= {0., 17}
In[639]:=
Length[Cases[pickPath[test], {__Integer}, Infinity]] // Timing
Out[639]= {0., 17}
I am posting this in case somebody might want to search for counterexamples that the code like pickPath or LongestRuns does indeed generate a sequence with smallest number of interruptions.
Here's a go at it...
runsByN: For each number, show whether it appears or not in each sublist
list= {{4, 2, 7, 5, 1, 9, 10}, {10, 1, 8, 3, 2, 7}, {9, 2, 7, 3, 6, 4, 5}, {10, 3, 6, 4, 8, 7}, {7}, {3, 1, 8, 2, 4, 7, 10, 6}, {7, 6}, {10, 2, 8, 5, 6, 9, 7, 3}, {1, 4, 8}, {5, 6, 1}, {3, 2, 1}, {10,6, 4}, {10, 7, 3}, {10, 2, 4}, {1, 3, 5, 9, 7, 4, 2, 8}, {7, 1, 3}, {5, 7, 1, 10, 2, 3, 6, 8}, {10, 8, 3, 6, 9, 4, 5, 7}, {3, 10, 5}, {1}, {7, 9, 1, 6, 2, 4}, {9, 7, 6, 2}, {5, 6, 9, 7}, {1, 5}, {1,9, 7, 5, 4}, {5, 4, 9, 3, 1, 7, 6, 8}, {6}, {10}, {6}, {7, 9}};
runsByN = Transpose[Table[If[MemberQ[#, n], n, 0], {n, Max[list]}] & /# list]
Out = {{1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,1, 1, 1, 0, 0, 0, 0}, {2, 2, 2, 0, 0, 2, 0, 2, 0, 0, 2, 0, 0, 2, 2,0, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0}, {0, 3, 3, 3, 0, 3, 0,3, 0, 0, 3, 0, 3, 0, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0,0}, {4, 0, 4, 4, 0, 4, 0, 0, 4, 0, 0, 4, 0, 4, 4, 0, 0, 4, 0, 0, 4, 0, 0, 0, 4, 4, 0, 0, 0, 0}, {5, 0, 5, 0, 0, 0, 0, 5, 0, 5, 0, 0, 0, 0, 5, 0, 5, 5, 5, 0, 0, 0, 5, 5, 5, 5, 0, 0, 0, 0}, {0, 0, 6, 6, 0, 6, 6, 6, 0, 6, 0, 6, 0, 0, 0, 0, 6, 6, 0, 0, 6, 6, 6, 0, 0, 6, 6, 0,6, 0}, {7, 7, 7, 7, 7, 7, 7, 7, 0, 0, 0, 0, 7, 0, 7, 7, 7, 7, 0, 0, 7, 7, 7, 0, 7, 7, 0, 0, 0, 7}, {0, 8, 0, 8, 0, 8, 0, 8, 8, 0, 0, 0, 0, 0, 8, 0, 8, 8, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0}, {9, 0, 9, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 9, 0, 0, 9, 0, 0, 9, 9, 9, 0, 9, 9, 0, 0, 0, 9}, {10, 10, 0, 10, 0, 10, 0, 10, 0, 0, 0, 10, 10, 10, 0, 0, 10, 10, 10, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0}};
runsByN is list transposed, with zeros inserted to represent missing numbers. It shows the sublists in which 1, 2, 3, and 4 appeared.
myPick: Picking numbers that constitute an optimal path
myPick recursively builds a list of the longest runs. It doesn't look for all optimal solutions, but rather the first solution of minimal length.
myPick[{}, c_] := Flatten[c]
myPick[l_, c_: {}] :=
Module[{r = Length /# (l /. {x___, 0, ___} :> {x}), m}, m = Max[r];
myPick[Cases[(Drop[#, m]) & /# l, Except[{}]],
Append[c, Table[Position[r, m, 1, 1][[1, 1]], {m}]]]]
choices = myPick[runsByN]
(* Out= {7, 7, 7, 7, 7, 7, 7, 7, 1, 1, 1, 10, 10, 10, 3, 3, 3, 3, 3, 1, 1, 6, 6, 1, 1, 1, 6, 10, 6, 7} *)
Thanks to Mr.Wizard for suggesting the use of a replacement rule as an efficient alternative to TakeWhile.
Epilog:Visualizing the solution path
runsPlot[choices1_, runsN_] :=
Module[{runs = {First[#], Length[#]} & /# Split[choices1], myArrow,
m = Max[runsN]},
myArrow[runs1_] :=
Module[{data1 = Reverse#First[runs1], data2 = Reverse[runs1[[2]]],
deltaX},
deltaX := data2[[1]] - 1;
myA[{}, _, out_] := out;
myA[inL_, deltaX_, outL_] :=
Module[{data3 = outL[[-1, 1, 2]]},
myA[Drop[inL, 1], inL[[1, 2]] - 1,
Append[outL, Arrow[{{First[data3] + deltaX,
data3[[2]]}, {First[data3] + deltaX + 1, inL[[1, 1]]}}]]]];
myA[Drop[runs1, 2], deltaX, {Thickness[.005],
Arrow[{data1, {First[data1] + 1, data2[[2]]}}]}]];
ListPlot[runsN,
Epilog -> myArrow[runs],
PlotStyle -> PointSize[Large],
Frame -> True,
PlotRange -> {{1, Length[choices1]}, {1, m}},
FrameTicks -> {All, Range[m]},
PlotRangePadding -> .5,
FrameLabel -> {"Sublist", "Number", "Sublist", "Number"},
GridLines :> {FoldList[Plus, 0, Length /# Split[choices1]], None}
]];
runsPlot[choices, runsByN]
The chart below represents the data from list.
Each plotted point corresponds to a number and the sublist in which it occurred.
So here is my "one liner" with improvements by Mr.Wizard:
pickPath[lst_List] :=
Module[{M = Fold[{#2, #} &, {{}}, Reverse#lst]},
Reap[While[M != {{}},
Do[Sow##[[-2,1]], {Length## - 1}] &#
NestWhileList[# ⋂ First[M = Last#M] &, M[[1]], # != {} &]
]][[2, 1]]
]
It basically uses intersection repeatedly on consecutive lists until it comes up empty, and then does it again and again. In a humongous torture test case with
M = Table[RandomSample[Range[1000], RandomInteger[{1, 200}]], {1000}];
I get Timing[] consistently around 0.032 on my 2GHz Core 2 Duo.
Below this point is my first attempt, which I'll leave for your perusal.
For a given list of lists of elements M we count the different elements and the number of lists, list the different elements in canonical order, and construct a matrix K[i,j] detailing the presence of element i in list j:
elements = Length#(Union ## M);
lists = Length#M;
eList = Union ## M;
positions = Flatten#Table[{i, Sequence ## First#Position[eList, M[[i,j]]} -> 1,
{i, lists},
{j, Length#M[[i]]}];
K = Transpose#Normal#SparseArray#positions;
The problem is now equivalent to traversing this matrix from left to right, by only stepping on 1's, and changing rows as few times as possible.
To achieve this I Sort the rows, take the one with the most consecutive 1's at the start, keep track of what element I picked, Drop that many columns from K and repeat:
R = {};
While[Length#K[[1]] > 0,
len = LengthWhile[K[[row = Last#Ordering#K]], # == 1 &];
Do[AppendTo[R, eList[[row]]], {len}];
K = Drop[#, len] & /# K;
]
This has an AbsoluteTiming of approximately three times that of Sjoerd's approach.
My solution is based on the observation that 'greed is good' here. If I have the choice between interrupting a chain and beginning a new, potentially long chain, picking the new one to continue doesn't do me any good. The new chain gets longer with the same amount as the old chain gets shorter.
So, what the algorithm basically does is starting at the first sublist and for each of its members finding the number of additional sublists that have the same member and choosing the sublist member that has the most neighboring twins. This process then continues at the sublist at the end of this first chain and so on.
So combining this in a recursive algorithm we end up with:
pickPath[lst_] :=
Module[{lengthChoices, bestElement},
lengthChoices =
LengthWhile[lst, Function[{lstMember}, MemberQ[lstMember, #]]] & /#First[lst];
bestElement = Ordering[lengthChoices][[-1]];
If[ Length[lst] == lengthChoices[[bestElement]],
ConstantArray[lst[[1, bestElement]], lengthChoices[[bestElement]]],
{
ConstantArray[lst[[1, bestElement]], lengthChoices[[bestElement]]],
pickPath[lst[[lengthChoices[[bestElement]] + 1 ;; -1]]]
}
]
]
Test
In[12]:= lst =
Table[RandomSample[Range[10], RandomInteger[{1, 7}]], {8}]
Out[12]= {{3, 10, 6}, {8, 2, 10, 5, 9, 3, 6}, {3, 7, 10, 2, 8, 5,
9}, {6, 9, 1, 8, 3, 10}, {1}, {2, 9, 4}, {9, 5, 2, 6, 8, 7}, {6, 9,
4, 5}}
In[13]:= pickPath[lst] // Flatten // AbsoluteTiming
Out[13]= {0.0020001, {10, 10, 10, 10, 1, 9, 9, 9}}
Dreeves' Brute Force approach
argMax[f_, dom_List] :=
Module[{g}, g[e___] := g[e] = f[e];(*memoize*) dom[[Ordering[g /# dom, -1]]]]
pick[x_] := argMax[-Length#Split[#] &, Tuples[x]]
In[14]:= pick[lst] // AbsoluteTiming
Out[14]= {0.7340420, {{10, 10, 10, 10, 1, 9, 9, 9}}}
The first time I used a slightly longer test list. The brute force approach brought my computer to a virtual standstill, claiming all the memory it had. Pretty bad. I had to restart after 10 minutes. Restarting took me another quarter, due to the PC becoming extremely non-responsive.
Could use integer linear programming. Here is code for that.
bestPick[lists_] := Module[
{picks, span, diffs, v, dv, vars, diffvars, fvars,
c1, c2, c3, c4, constraints, obj, res},
span = Max[lists] - Min[lists];
vars = MapIndexed[v[Sequence ## #2] &, lists, {2}];
picks = Total[vars*lists, {2}];
diffs = Differences[picks];
diffvars = Array[dv, Length[diffs]];
fvars = Flatten[{vars, diffvars}];
c1 = Map[Total[#] == 1 &, vars];
c2 = Map[0 <= # <= 1 &, fvars];
c3 = Thread[span*diffvars >= diffs];
c4 = Thread[span*diffvars >= -diffs];
constraints = Join[c1, c2, c3, c4];
obj = Total[diffvars];
res = Minimize[{obj, constraints}, fvars, Integers];
{res[[1]], Flatten[vars*lists /. res[[2]] /. 0 :> Sequence[]]}
]
Your example:
lists = {{1, 2, 3}, {2, 3}, {1}, {1, 3, 4}, {4, 1}}
bestPick[lists]
Out[88]= {1, {2, 2, 1, 1, 1}}
For larger problems Minimize might run into trouble since it uses exact methods for solving relaxed LPs. In which case you might need to switch to NMinimize, and change the domain argument to a constraint of the form Element[fvars,Integers].
Daniel Lichtblau
A week is up! Here is the fabled solution from Carl Woll. (I tried to get him to post it himself. Carl, if you come across this and want to take official credit, just paste it in as a separate answer and I'll delete this one!)
pick[data_] := Module[{common,tmp},
common = {};
tmp = Reverse[If[(common = Intersection[common,#])=={}, common = #, common]& /#
data];
common = .;
Reverse[If[MemberQ[#, common], common, common = First[#]]& /# tmp]]
Still quoting Carl:
Basically, you start at the beginning, and find the element which gives you
the longest string of common elements. Once the string can no longer be
extended, start a new string. It seems to me that this algorithm ought to
give you a correct answer (there are many correct answers).

Unknown error in array initialization: invalid in-class initialization of static data member of non- integral type `const unsigned char[256]'

I was trying to make a Intel 8080 CPU emulator (then I'd like to emulate Space Invaders, which use it).
I coded nearly complete implementation of this CPU (thanks to MAME and Tickle project (mostly) ;) ) except undocument instructions (0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x0CB, 0x0D9, 0x0DD, 0x0ED, 0x0FD).
I've have only problems when I compile it, I don't know why.
This is the code:
static const unsigned char cycles_table[256] =
{
/* 8080's Cycles Table */
/* 0 1 2 3 4 5 6 7 8 9 A B C D E F */
/*0*/ 4, 10, 7, 5, 5, 5, 7, 4, 0, 10, 7, 5, 5, 5, 7, 4,
/*1*/ 0, 10, 7, 5, 5, 5, 7, 4, 0, 10, 7, 5, 5, 5, 7, 4,
/*2*/ 0, 10, 16, 5, 5, 5, 7, 4, 0, 10, 16, 5, 5, 5, 7, 4,
/*3*/ 0, 10, 13, 5, 10, 10, 10, 4, 0, 10, 13, 5, 5, 5, 7, 4,
/*4*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*5*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*6*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*7*/ 7, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 5, 7, 5,
/*8*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*9*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*A*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*B*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*C*/ 5, 10, 10, 10, 11, 11, 7, 11, 5, 10, 10, 0, 11, 17, 7, 11,
/*D*/ 5, 10, 10, 10, 11, 11, 7, 11, 5, 0, 10, 10, 11, 0, 7, 11,
/*E*/ 5, 10, 10, 18, 11, 11, 7, 11, 5, 5, 10, 4, 11, 0, 7, 11,
/*F*/ 5, 10, 10, 4, 11, 11, 7, 11, 5, 5, 10, 4, 11, 0, 7, 11
};
g++ takes me this error:
8080.h:521: error: invalid in-class initialization of static data member of non- integral type `const unsigned char[256]'
This array is in a class called i8080.
Like it says, you cannot initialize static non-integral types in a class definition. That is, you could do this:
static const unsigned value = 123;
static const bool value_again = true;
But not anything else.
What you should do is place this in your class definition:
static const unsigned char cycles_table[256];
And in the corresponding source file, place what you have:
const unsigned char i8080::cycles_table[256] = // ...
What this does is say (in the definition), "Hey, there's gonna be this array." and in the source file, "Hey, here's that array."
Static data members need to be initialised outside of the class.
You cannot initialize a static array embedded within a class like this:
class Thing
{
public:
static const int [3] = {1, 2, 3};
};
You have to do it like this:
thing.h:
class Thing
{
public:
static const int vals[3];;
};
thing.cpp:
const int Thing::vals[3] = {1, 2, 3};