The problem is as follows: Write a predicate in Prolog most_common_sublist(L1,N,L2) that will find the sublist L2 with length N such that it is the most common sublist in L1.
//Example 1:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],1,L).
L=[2];
//Example 2:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],2,L).
L=[2,2];
//Example 3:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],3,L).
L=[2,2,3];
My approach was to generate all the possible sublists of size N using the generator predicate, check which of those is the most common one in the list using the check predicate, and then just put that as my result.
The reason why I'm not using the built-in predicates for length and add is because I'm supposed to write my own.
My generator predicate works, it gives out the correct output.
?- generator([1,2,2,3,2,2,4,2,2,3],3,L).
L = [[1, 2, 2], [2, 2, 3], [2, 3, 2], [3, 2, 2], [2, 2, 4], [2, 4, 2], [4, 2|...], [2|...]] [write]
L = [[1, 2, 2], [2, 2, 3], [2, 3, 2], [3, 2, 2], [2, 2, 4], [2, 4, 2], [4, 2, 2], [2, 2, 3]]
I checked all my predicates and they all seem to work (at least for the test cases I'm using), the problem occurs with the check predicate. It seems to work fine until it gets to N>=P (when this is NOT true, works fine when it is true). I expect the program to go onto the next check predicate under it (the third check predicate) so that it stores Temp value in Result instead of the H value. For some reason it does not go to the third check predicate (I checked with debugger), instead it does something weird (I can't figure out what).
most_common_sublist(L,N,Result):-generator(L,N,LOP),check(LOP,_,Temp),add(Temp,[],Result).
add([],L,L).
add([X|L1],L2,[X|L3]):-add(L1,L2,L3).
length([],0).
length([X|O],N):-length(O,M),N is M+1.
sublist([H|_],1,[H]).
sublist([H|T],N,[H|LOP]):-M is N-1,sublist(T,M,LOP).
generator(L,N,[L]):-length(L,M),N=:=M.
generator([H|T],N,LOP):-sublist([H|T],N,PN),generator(T,N,LP),add([PN],LP,LOP).
check([],Z,K):-Z is 0,add([],[],K).
check([H|T],Hits,Result):-check_how_many(H,[H|T],N),check(T,P,_),N>=P,Hits is N,add(H,[],Result).
check([H|T],Hits,Result):-check_how_many(H,[H|T],N),check(T,P,Temp),Hits is P,add(Temp,[],Result).
check_how_many(X,[X],1).
check_how_many(_,[_],0).
check_how_many(Pattern,[H|T],Hits):-same(Pattern,H),check_how_many(Pattern,T,P),Hits is P+1.
check_how_many(Pattern,[_|T],Hits):-check_how_many(Pattern,T,P),Hits is P.
same([], []).
same([H1|R1], [H2|R2]):-
H1 = H2,
same(R1, R2).
Since I'm not familiar with your code I rewrote it with similar functionality. Lines followed by %here are my improvements (2 times used). For simplicity I used the inbuild predicates length/2 and append/3 instead of add/3. sublist/3 has a complete different code but same functionality, same/2 is not necessary at all. Most uses of you add/3 were not necessary as well as some equality statements.
most_common_sublist(L,N,Temp):-
generator(L,N,LOP),
check(LOP,_,Temp).
sublist(L,N,S):-
length(S,N),
append(S,_,L).
generator(L,N,[L]):-
length(L,N).
generator([H|T],N,LOP):-
sublist([H|T],N,PN),
generator(T,N,LP),
append([PN],LP,LOP).
check([],0,[]).
check([H|T],N,H):-
check_how_many(H,[H|T],N),
check(T,P,_),
N>=P.
check([H|T],P,Temp):-
check_how_many(H,[H|T],N),
check(T,P,Temp)
%here
, N=<P
.
check_how_many(X,[X],1).
check_how_many(_,[_],0).
check_how_many(H,[H|T],Hits):-
check_how_many(H,T,P),
Hits is P+1.
check_how_many(Pattern,[H|T],P):-
%here
Pattern \== H,
check_how_many(Pattern,T,P).
After giving up on tracing I just used the following call to debug after enabling long output (
?- set_prolog_flag(answer_write_options,[max_depth(100)]).
):
?- findall(Temp,check([[1, 2, 2], [2, 2, 1]],_,Temp),Out).
Initial output was
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,1],[2,2,1],[],[],[2,2,1],[2,2,1],[],[]].
Which contains way to much empty lists. First fix (%here) was to set the condition N=<P for the last check/3 case. Until now it was possible to choose a P lower than N, which should be covered by the 2nd check/3 case. Output changed to
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,1],[2,2,1],[2,2,1],[]].
Better, but still empty lists possible. A similar case happens in the last check_how_many/3 case: you have to state that H and Pattern are different, otherwise it would be possible for a fitting Pattern not to be counted. Lets check the output
Out = [[1,2,2],[1,2,2],[1,2,2],[2,2,1]].
Way better. Lets check another case:
?- findall(Temp,check([[1, 2, 2], [1, 2, 2], [2, 2, 1]],_,Temp),Out).
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2]].
?- findall(Temp,check([[1, 2, 2], [2, 2, 2], [1, 2, 2]],_,Temp),Out).
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,2],[2,2,2],[2,2,2],[1,2,2]].
Works... Almost.
So the problem seems to be check_how_many/3: alter
check_how_many(_,[_],0).
to
check_how_many(_,[],0).
and you should be fine.
?- findall(Temp,check([[1, 2, 2], [2, 2, 2], [1, 2, 2]],_,Temp),Out).
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2]].
Since it is way more fun to write the code yourself than to debug foreign code I'll add another answer with my attempt.
It is way more fun to code by yourself than to debug alien code. So here is my attempt. It works different than yours because I do not calculate possible subsets but work on the "leftover" list. I use the inbuild predicates length/2, append/3 and member/2 which are 3 lines each to write down.
% check how often 2.nd attribute List occurs in 1st attribute List.
countit([],_,Val,Val).
countit([H|In],Out,Past,Future):-
( append(Out,_,[H|In])
-> Present is Past+1,
countit(In,Out,Present,Future)
; countit(In,Out,Past,Future)
).
mostCommonSublist(In,N,Out):-
maxStartList(In,N,OutList,Max),
member((Max,Out),OutList).
% for every endlist calculate how often the first N elements appear within the endlist, track the max
maxStartList(In,N,[(1,In)],1):-
length(In,N),
!.
maxStartList([H|In],N,[(CntH,Curr)|MaxList],Max):-
length(Curr,N),
countit([H|In],Curr,0,CntH),
maxStartList(In,N,MaxList,CntIn),
Max is max(CntH , CntIn).
The main predicate mostCommonSublist/3 calls the predicate maxStartList/4 to get all sublists/countpairs. Afterwards it validates if the count of a sublist equals the maximum. This is neccessary to check for different answers with the same (maximum) count.
The maxStartList/4 drops elements from the inputlist and counts how often the start of the current list occurs within it. Also it keeps track of the maximum.
For the current inputlist the calculating predicate countit/4 is called. It calculated for a given inputlist (first argument) the number of occurences of a sublist (2nd argument).
My code actually uses a twist: The content of the sublist is not unified when calling countit/4 for the first time, just the sublist length is set. In the first recursion it will unify all entries with the start elements from the inputlist and count it. In the following recursion steps the sublist if fully known. Using an if-then-else (..->..;..) the two cases for remaining inputlist starts with the sublist or not, the predicate basically counts the occurences. Until the remaining inputlist has only N elements left (length(In,N)).
The calculated count/sublist pairs are stored in a list, the maximum is tracked as well.
After knowing all count/sublist pairs I finallize it all by stating that the count of an accepted sublist has to be equal to the maximum.
The nice thing is that there are no dublicate answers.
?- mostCommonSublist([1,2,2,3,2,2,4,2,2,3],3,L).
L = [2,2,3] ;
false.
?- mostCommonSublist([1,2,2,1,2,1,2,2,2,3],3,L).
L = [1,2,2] ;
L = [2,1,2] ;
false.
?- mostCommonSublist([1,2,2,1,2,1,2,2,2,1],2,L).
L = [1,2] ;
L = [2,2] ;
L = [2,1] ;
false.
If I have an input like this (1, 2, 3, 4, 5, 6)
The output has to be ... [[1, 2], [3, 4], [5, 6]].
I know how to deal with if it's one element but not two.
x=[]
for number in numbers:
x.append([number])
I'll appreciate your any help!
Something like this would work:
out = []
lst = (1,2,3,4,5,6,7,8,9,10)
for x in range(len(lst)):
if x % 2 == 0:
out.append([lst[x], lst[x+1]])
else:
continue
To use this, just set lst equal to whatever list of numbers you want. The final product is stored in out.
There is a shorter way of doing what you want:
result = []
L = (1,2,3,4,5,6,7,8,9,10)
result = [[L[i], L[i + 1]] for i in range(0, len(L) - 1, 2)]
print(result)
You can use something like this. This solution also works for list of odd length
def func(lst):
res = []
# Go through every 2nd value | 0, 2, 4, ...
for i in range(0, len(lst), 2):
# Append a slice of the list, + 2 to include the next value
res.append(lst[i : i + 2])
return res
# Output
>>> lst = [1, 2, 3, 4, 5, 6]
>>> func(lst)
[[1, 2], [3, 4], [5, 6]]
>>> lst2 = [1, 2, 3, 4, 5, 6, 7]
>>> func(lst2)
[[1, 2], [3, 4], [5, 6], [7]]
List comprehension solution
def func(lst):
return [lst[i:i+2] for i in range(0, len(lst), 2)]
Slicing is better in this case as you don't have to account for IndexError allowing it to work for odd length as well.
If you want you can also add another parameter to let you specify the desired number of inner elements.
def func(lst, size = 2): # default of 2 it none specified
return [lst[i:i+size] for i in range(0, len(lst), size)]
There's a few hurdles in this problem. You want to iterate through the list without going past the end of the list and you need to deal with the case that list has an odd length. Here's one solution that works:
def foo(lst):
result = [[x,y] for [x,y] in zip(lst[0::2], lst[1::2])]
return result
In case this seems convoluted, let's break the code down.
Index slicing:
lst[0::2] iterates through lst by starting at the 0th element and proceeds in increments of 2. Similarly lst[1::2] iterates through starting at the 1st element (colloquially the second element) and continues in increments of 2.
Example:
>>> lst = (1,2,3,4,5,6,7)
>>> print(lst[0::2])
(1,3,5,7)
>>> print(lst[1::2])
(2,4,6)
zip: zip() takes two lists (or any iterable object for that matter) and returns a list containing tuples. Example:
>>> lst1 = (10,20,30, 40)
>>> lst2 = (15,25,35)
>>> prit(zip(lst1, lst2))
[(10,15), (20,25), (30,35)]
Notice that zip(lst1, lst2) has the nice property that if one of it's arguments is longer than the other, zip() stops zipping whenever the shortest iterable is out of items.
List comprehension: python allows iteration quite generally. Consider the statement:
>>> [[x,y] for [x,y] in zip(lst1,lst2)]
The interior bit "for [x,y] in zip(lst1,lst2)" says "iterate through all pairs of values in zip, and give their values to x and y". In the rest of the statement
"[[x,y] for [x,y] ...]", it says "for each set of values x and y takes on, make a list [x,y] to be stored in a larger list". Once this statement executes, you have a list of lists, where the interior lists are all possible pairs for zip(lst1,lst2)
Very Clear solution:
l = (1, 2, 3, 4, 5, 6)
l = iter(l)
w = []
for i in l:
sub = []
sub.append(i)
sub.append(next(l))
w.append(sub)
print w
as follows I have a 2-D list/array
list1 = [[1,2],[3,4]]
list2 = [[3,4],[5,6]]
how can I use the function as union1d(x,y)to make list1 and list2 as one list
list3 = [[1,2],[3,4],[5,6]]
union1d just does:
unique(np.concatenate((ar1, ar2)))
so if you have a method of finding unique rows, you have the solution.
As described in the suggested link, and elsewhere, you can do this by converting the array to a 1d structured array. Here the simple version is
If arr is:
arr=np.array([[1,2],[3,4],[3,4],[5,6]])
the structured equivalent (a view, same data):
In [4]: arr.view('i,i')
Out[4]:
array([[(1, 2)],
[(3, 4)],
[(3, 4)],
[(5, 6)]],
dtype=[('f0', '<i4'), ('f1', '<i4')])
In [5]: np.unique(arr.view('i,i'))
Out[5]:
array([(1, 2), (3, 4), (5, 6)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
and back to 2d int:
In [7]: np.unique(arr.view('i,i')).view('2int')
Out[7]:
array([[1, 2],
[3, 4],
[5, 6]])
This solution does require a certain familiarity with compound dtypes.
Using return_index saves that return view. We can index arr directly with that index:
In [54]: idx=np.unique(arr.view('i,i'),return_index=True)[1]
In [55]: arr[idx,:]
Out[55]:
array([[1, 2],
[3, 4],
[5, 6]])
For what it's worth, unique does a sort and then uses a mask approach to remove adjacent duplicates.
It's the sort that requires a 1d array, the rest works in 2d
Here arr is already sorted
In [42]: flag=np.concatenate([[True],(arr[1:,:]!=arr[:-1,:]).all(axis=1)])
In [43]: flag
Out[43]: array([ True, True, False, True], dtype=bool)
In [44]: arr[flag,:]
Out[44]:
array([[1, 2],
[3, 4],
[5, 6]])
https://stackoverflow.com/a/16971324/901925 shows this working with lexsort.
================
The mention of np.union1d set me and Divakar to focus on numpy methods. But it starting with lists (of lists), it is likely to be faster to use Python set methods.
For example, using list and set comprehensions:
In [99]: [list(x) for x in {tuple(x) for x in list1+list2}]
Out[99]: [[1, 2], [3, 4], [5, 6]]
You could also take the set for each list, and do a set union.
The tuple conversion is needed because a list isn't hashable.
One approach would be to stack those two input arrays vertically with np.vstack and then finding the unique rows in it. It would be memory intensive as we would discard rows from it thereafter.
Another approach would be to find the rows in the first array that are exclusive to it, i.e. not present in the second array and thus just stacking those exclusive rows alongwith the second array. Of course, this would assume that there are unique rows among each input array.
The crux of such a proposed memory-saving implementation would be to get those exclusive rows from first array. For the same we would convert each row into a linear index equivalent considering each row as an indexing tuple on a n-dimensional grid, with the n being the number of columns in the input arrays. Thus, assuming the input arrays as arr1 and arr2, we would have an implementation like so -
# Get dim of ndim-grid on which linear index equivalents are to be mapped
dims = np.maximum(arr1.max(0),arr2.max(0)) + 1
# Get linear index equivalents for arr1, arr2
idx1 = np.ravel_multi_index(arr1.T,dims)
idx2 = np.ravel_multi_index(arr2.T,dims)
# Finally get the exclusive rows and stack with arr2 for desired o/p
out = np.vstack((arr1[~np.in1d(idx1,idx2)],arr2))
Sample run -
In [93]: arr1
Out[93]:
array([[1, 2],
[3, 4],
[5, 3]])
In [94]: arr2
Out[94]:
array([[3, 4],
[5, 6]])
In [95]: out
Out[95]:
array([[1, 2],
[5, 3],
[3, 4],
[5, 6]])
For more info on setting up those linear index equivalents, please refer to this post.