To check if sublist exists in another list - list

coll = [[3, 3], [2, 2, 2], [2, 4], [2, 3], [2, 2]]
main = [4, 3, 3, 2, 2, 2, 2, 2, 2, 2]
I have 2 lists. 'coll' is a list of lists with each sublist containing integers which might have duplicates(ex- [2, 2, 2]). And main is a list containing integers. I want to check if the sublist elements of 'coll' are present in 'main' or not. For this case, it is true since [2, 2, 2], [3, 3] and other sublists are present. The order of elements in the sublist and 'main' doesn't matter. Whatever elements are present in sublist they may be present in 'main' in any position.
I cannot use sets because of the presence of duplicates. And also I cannot use strings because:
coll = ['222']
main = ['423262']
I have used a sample of sublist to show the problem with using string. My algorithm requirement is that in this case also 'true' is returned because '2' is present at 3 locations , index- 1, 2, 5. But:
if coll in main:
return True
else:
return False
this returns false if I use strings for checking.
Please suggest any method.

I think the most readable way to do that is to create a Counter instance for each of your sublists, and them check with the list "count" method if it matches the requirement for each argument of the sublist:
from itertools import Counter
def checksub(main, sublist):
c = Counter(sublist)
for num, count in c.items():
if main.count(num) < count:
return False
return True
all(checksub(main, sublist) for sublist in coll)
This is not fast - if you are iterating over a large data volume, you'd better use some approach that map the "main" list into a data-structure where the counting can be checked in a faster way tahn using "count". Or, if there are few distinct numbers, even something as simple as cache the returns of "count" for each different number.
Otherwise for small sizes of "main" this might suffice.
On a second reading of your question, it seems like you only require that one of the sublists be present in main - if that is the case, just replace the call to all for any above.

Related

Most common sublist in Prolog

The problem is as follows: Write a predicate in Prolog most_common_sublist(L1,N,L2) that will find the sublist L2 with length N such that it is the most common sublist in L1.
//Example 1:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],1,L).
L=[2];
//Example 2:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],2,L).
L=[2,2];
//Example 3:
?- most_common_sublist([1,2,2,3,2,2,4,2,2,3],3,L).
L=[2,2,3];
My approach was to generate all the possible sublists of size N using the generator predicate, check which of those is the most common one in the list using the check predicate, and then just put that as my result.
The reason why I'm not using the built-in predicates for length and add is because I'm supposed to write my own.
My generator predicate works, it gives out the correct output.
?- generator([1,2,2,3,2,2,4,2,2,3],3,L).
L = [[1, 2, 2], [2, 2, 3], [2, 3, 2], [3, 2, 2], [2, 2, 4], [2, 4, 2], [4, 2|...], [2|...]] [write]
L = [[1, 2, 2], [2, 2, 3], [2, 3, 2], [3, 2, 2], [2, 2, 4], [2, 4, 2], [4, 2, 2], [2, 2, 3]]
I checked all my predicates and they all seem to work (at least for the test cases I'm using), the problem occurs with the check predicate. It seems to work fine until it gets to N>=P (when this is NOT true, works fine when it is true). I expect the program to go onto the next check predicate under it (the third check predicate) so that it stores Temp value in Result instead of the H value. For some reason it does not go to the third check predicate (I checked with debugger), instead it does something weird (I can't figure out what).
most_common_sublist(L,N,Result):-generator(L,N,LOP),check(LOP,_,Temp),add(Temp,[],Result).
add([],L,L).
add([X|L1],L2,[X|L3]):-add(L1,L2,L3).
length([],0).
length([X|O],N):-length(O,M),N is M+1.
sublist([H|_],1,[H]).
sublist([H|T],N,[H|LOP]):-M is N-1,sublist(T,M,LOP).
generator(L,N,[L]):-length(L,M),N=:=M.
generator([H|T],N,LOP):-sublist([H|T],N,PN),generator(T,N,LP),add([PN],LP,LOP).
check([],Z,K):-Z is 0,add([],[],K).
check([H|T],Hits,Result):-check_how_many(H,[H|T],N),check(T,P,_),N>=P,Hits is N,add(H,[],Result).
check([H|T],Hits,Result):-check_how_many(H,[H|T],N),check(T,P,Temp),Hits is P,add(Temp,[],Result).
check_how_many(X,[X],1).
check_how_many(_,[_],0).
check_how_many(Pattern,[H|T],Hits):-same(Pattern,H),check_how_many(Pattern,T,P),Hits is P+1.
check_how_many(Pattern,[_|T],Hits):-check_how_many(Pattern,T,P),Hits is P.
same([], []).
same([H1|R1], [H2|R2]):-
H1 = H2,
same(R1, R2).
Since I'm not familiar with your code I rewrote it with similar functionality. Lines followed by %here are my improvements (2 times used). For simplicity I used the inbuild predicates length/2 and append/3 instead of add/3. sublist/3 has a complete different code but same functionality, same/2 is not necessary at all. Most uses of you add/3 were not necessary as well as some equality statements.
most_common_sublist(L,N,Temp):-
generator(L,N,LOP),
check(LOP,_,Temp).
sublist(L,N,S):-
length(S,N),
append(S,_,L).
generator(L,N,[L]):-
length(L,N).
generator([H|T],N,LOP):-
sublist([H|T],N,PN),
generator(T,N,LP),
append([PN],LP,LOP).
check([],0,[]).
check([H|T],N,H):-
check_how_many(H,[H|T],N),
check(T,P,_),
N>=P.
check([H|T],P,Temp):-
check_how_many(H,[H|T],N),
check(T,P,Temp)
%here
, N=<P
.
check_how_many(X,[X],1).
check_how_many(_,[_],0).
check_how_many(H,[H|T],Hits):-
check_how_many(H,T,P),
Hits is P+1.
check_how_many(Pattern,[H|T],P):-
%here
Pattern \== H,
check_how_many(Pattern,T,P).
After giving up on tracing I just used the following call to debug after enabling long output (
?- set_prolog_flag(answer_write_options,[max_depth(100)]).
):
?- findall(Temp,check([[1, 2, 2], [2, 2, 1]],_,Temp),Out).
Initial output was
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,1],[2,2,1],[],[],[2,2,1],[2,2,1],[],[]].
Which contains way to much empty lists. First fix (%here) was to set the condition N=<P for the last check/3 case. Until now it was possible to choose a P lower than N, which should be covered by the 2nd check/3 case. Output changed to
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,1],[2,2,1],[2,2,1],[]].
Better, but still empty lists possible. A similar case happens in the last check_how_many/3 case: you have to state that H and Pattern are different, otherwise it would be possible for a fitting Pattern not to be counted. Lets check the output
Out = [[1,2,2],[1,2,2],[1,2,2],[2,2,1]].
Way better. Lets check another case:
?- findall(Temp,check([[1, 2, 2], [1, 2, 2], [2, 2, 1]],_,Temp),Out).
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2]].
?- findall(Temp,check([[1, 2, 2], [2, 2, 2], [1, 2, 2]],_,Temp),Out).
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[2,2,2],[2,2,2],[2,2,2],[1,2,2]].
Works... Almost.
So the problem seems to be check_how_many/3: alter
check_how_many(_,[_],0).
to
check_how_many(_,[],0).
and you should be fine.
?- findall(Temp,check([[1, 2, 2], [2, 2, 2], [1, 2, 2]],_,Temp),Out).
Out = [[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2],[1,2,2]].
Since it is way more fun to write the code yourself than to debug foreign code I'll add another answer with my attempt.
It is way more fun to code by yourself than to debug alien code. So here is my attempt. It works different than yours because I do not calculate possible subsets but work on the "leftover" list. I use the inbuild predicates length/2, append/3 and member/2 which are 3 lines each to write down.
% check how often 2.nd attribute List occurs in 1st attribute List.
countit([],_,Val,Val).
countit([H|In],Out,Past,Future):-
( append(Out,_,[H|In])
-> Present is Past+1,
countit(In,Out,Present,Future)
; countit(In,Out,Past,Future)
).
mostCommonSublist(In,N,Out):-
maxStartList(In,N,OutList,Max),
member((Max,Out),OutList).
% for every endlist calculate how often the first N elements appear within the endlist, track the max
maxStartList(In,N,[(1,In)],1):-
length(In,N),
!.
maxStartList([H|In],N,[(CntH,Curr)|MaxList],Max):-
length(Curr,N),
countit([H|In],Curr,0,CntH),
maxStartList(In,N,MaxList,CntIn),
Max is max(CntH , CntIn).
The main predicate mostCommonSublist/3 calls the predicate maxStartList/4 to get all sublists/countpairs. Afterwards it validates if the count of a sublist equals the maximum. This is neccessary to check for different answers with the same (maximum) count.
The maxStartList/4 drops elements from the inputlist and counts how often the start of the current list occurs within it. Also it keeps track of the maximum.
For the current inputlist the calculating predicate countit/4 is called. It calculated for a given inputlist (first argument) the number of occurences of a sublist (2nd argument).
My code actually uses a twist: The content of the sublist is not unified when calling countit/4 for the first time, just the sublist length is set. In the first recursion it will unify all entries with the start elements from the inputlist and count it. In the following recursion steps the sublist if fully known. Using an if-then-else (..->..;..) the two cases for remaining inputlist starts with the sublist or not, the predicate basically counts the occurences. Until the remaining inputlist has only N elements left (length(In,N)).
The calculated count/sublist pairs are stored in a list, the maximum is tracked as well.
After knowing all count/sublist pairs I finallize it all by stating that the count of an accepted sublist has to be equal to the maximum.
The nice thing is that there are no dublicate answers.
?- mostCommonSublist([1,2,2,3,2,2,4,2,2,3],3,L).
L = [2,2,3] ;
false.
?- mostCommonSublist([1,2,2,1,2,1,2,2,2,3],3,L).
L = [1,2,2] ;
L = [2,1,2] ;
false.
?- mostCommonSublist([1,2,2,1,2,1,2,2,2,1],2,L).
L = [1,2] ;
L = [2,2] ;
L = [2,1] ;
false.

How to write the result in list instead of printing out in prolog

I'm writing a predicate for finding the bigger number in pairs. If the number has no pair - it will be just added.
write_list([A|[]]):- write(A).
write_list([A, B|Tail]) :- ((A>B, write(A));(A<B,write(B))), nl,
write_list([B|Tail]).
My problem is, I cannot figure out how to write a result in another list instead of printing the result out:
write_list([1,2,6,8,5], X).
X = [2,6,8,8,5].
write only prints the content to the standard output, it does not "yield" it to the result list. In Prolog the only way to generate values, is through unification.
You thus need to define a predicate maxpair/2, not write_list/1.
The predicate thus looks like:
:- use_module(library(clpfd)).
maxpair([A], [A]).
maxpair([A, B|Tail], [H|T]) :-
H #= max(A, B),
maxpair([B|Tail], T).
The first clause says that the maxpair/2 of a singleton list is that singleton list. The latter says that the maxpair/2 for a list containing two or more lists is a list that starts with the maximum of the first two elements, and we recurse on the tail of the list.
The above can also yield a list in reverse. For example:
?- write_list(L, [5, 3, 2, 1]).
L = [5, 3, 2, 1] ;
false.
?- write_list(L, [1, 4, 2, 5]).
false.
?- write_list(L, [3, 3, 5, 5]).
L = [_542, _548, _554, 5],
_542 in inf..3,
3#=max(_542, _548),
_548 in inf..3,
3#=max(_548, _554),
_554 in inf..3 ;
false.
?- write_list(L, [3, 5, 5, 4]).
L = [_1128, _1134, 5, 4],
_1128 in inf..3,
3#=max(_1128, _1134),
_1134 in inf..3 ;
false.
So depending on the situation it can:
fully reconstruct the list;
construct a list with some variables with intervals; or
proof that it is impossible to construct such a list.

python3.2)append two element in a list(lists in a list)

If I have an input like this (1, 2, 3, 4, 5, 6)
The output has to be ... [[1, 2], [3, 4], [5, 6]].
I know how to deal with if it's one element but not two.
x=[]
for number in numbers:
x.append([number])
I'll appreciate your any help!
Something like this would work:
out = []
lst = (1,2,3,4,5,6,7,8,9,10)
for x in range(len(lst)):
if x % 2 == 0:
out.append([lst[x], lst[x+1]])
else:
continue
To use this, just set lst equal to whatever list of numbers you want. The final product is stored in out.
There is a shorter way of doing what you want:
result = []
L = (1,2,3,4,5,6,7,8,9,10)
result = [[L[i], L[i + 1]] for i in range(0, len(L) - 1, 2)]
print(result)
You can use something like this. This solution also works for list of odd length
def func(lst):
res = []
# Go through every 2nd value | 0, 2, 4, ...
for i in range(0, len(lst), 2):
# Append a slice of the list, + 2 to include the next value
res.append(lst[i : i + 2])
return res
# Output
>>> lst = [1, 2, 3, 4, 5, 6]
>>> func(lst)
[[1, 2], [3, 4], [5, 6]]
>>> lst2 = [1, 2, 3, 4, 5, 6, 7]
>>> func(lst2)
[[1, 2], [3, 4], [5, 6], [7]]
List comprehension solution
def func(lst):
return [lst[i:i+2] for i in range(0, len(lst), 2)]
Slicing is better in this case as you don't have to account for IndexError allowing it to work for odd length as well.
If you want you can also add another parameter to let you specify the desired number of inner elements.
def func(lst, size = 2): # default of 2 it none specified
return [lst[i:i+size] for i in range(0, len(lst), size)]
There's a few hurdles in this problem. You want to iterate through the list without going past the end of the list and you need to deal with the case that list has an odd length. Here's one solution that works:
def foo(lst):
result = [[x,y] for [x,y] in zip(lst[0::2], lst[1::2])]
return result
In case this seems convoluted, let's break the code down.
Index slicing:
lst[0::2] iterates through lst by starting at the 0th element and proceeds in increments of 2. Similarly lst[1::2] iterates through starting at the 1st element (colloquially the second element) and continues in increments of 2.
Example:
>>> lst = (1,2,3,4,5,6,7)
>>> print(lst[0::2])
(1,3,5,7)
>>> print(lst[1::2])
(2,4,6)
zip: zip() takes two lists (or any iterable object for that matter) and returns a list containing tuples. Example:
>>> lst1 = (10,20,30, 40)
>>> lst2 = (15,25,35)
>>> prit(zip(lst1, lst2))
[(10,15), (20,25), (30,35)]
Notice that zip(lst1, lst2) has the nice property that if one of it's arguments is longer than the other, zip() stops zipping whenever the shortest iterable is out of items.
List comprehension: python allows iteration quite generally. Consider the statement:
>>> [[x,y] for [x,y] in zip(lst1,lst2)]
The interior bit "for [x,y] in zip(lst1,lst2)" says "iterate through all pairs of values in zip, and give their values to x and y". In the rest of the statement
"[[x,y] for [x,y] ...]", it says "for each set of values x and y takes on, make a list [x,y] to be stored in a larger list". Once this statement executes, you have a list of lists, where the interior lists are all possible pairs for zip(lst1,lst2)
Very Clear solution:
l = (1, 2, 3, 4, 5, 6)
l = iter(l)
w = []
for i in l:
sub = []
sub.append(i)
sub.append(next(l))
w.append(sub)
print w

python 2-D array get the function as np.unique or union1d

as follows I have a 2-D list/array
list1 = [[1,2],[3,4]]
list2 = [[3,4],[5,6]]
how can I use the function as union1d(x,y)to make list1 and list2 as one list
list3 = [[1,2],[3,4],[5,6]]
union1d just does:
unique(np.concatenate((ar1, ar2)))
so if you have a method of finding unique rows, you have the solution.
As described in the suggested link, and elsewhere, you can do this by converting the array to a 1d structured array. Here the simple version is
If arr is:
arr=np.array([[1,2],[3,4],[3,4],[5,6]])
the structured equivalent (a view, same data):
In [4]: arr.view('i,i')
Out[4]:
array([[(1, 2)],
[(3, 4)],
[(3, 4)],
[(5, 6)]],
dtype=[('f0', '<i4'), ('f1', '<i4')])
In [5]: np.unique(arr.view('i,i'))
Out[5]:
array([(1, 2), (3, 4), (5, 6)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
and back to 2d int:
In [7]: np.unique(arr.view('i,i')).view('2int')
Out[7]:
array([[1, 2],
[3, 4],
[5, 6]])
This solution does require a certain familiarity with compound dtypes.
Using return_index saves that return view. We can index arr directly with that index:
In [54]: idx=np.unique(arr.view('i,i'),return_index=True)[1]
In [55]: arr[idx,:]
Out[55]:
array([[1, 2],
[3, 4],
[5, 6]])
For what it's worth, unique does a sort and then uses a mask approach to remove adjacent duplicates.
It's the sort that requires a 1d array, the rest works in 2d
Here arr is already sorted
In [42]: flag=np.concatenate([[True],(arr[1:,:]!=arr[:-1,:]).all(axis=1)])
In [43]: flag
Out[43]: array([ True, True, False, True], dtype=bool)
In [44]: arr[flag,:]
Out[44]:
array([[1, 2],
[3, 4],
[5, 6]])
https://stackoverflow.com/a/16971324/901925 shows this working with lexsort.
================
The mention of np.union1d set me and Divakar to focus on numpy methods. But it starting with lists (of lists), it is likely to be faster to use Python set methods.
For example, using list and set comprehensions:
In [99]: [list(x) for x in {tuple(x) for x in list1+list2}]
Out[99]: [[1, 2], [3, 4], [5, 6]]
You could also take the set for each list, and do a set union.
The tuple conversion is needed because a list isn't hashable.
One approach would be to stack those two input arrays vertically with np.vstack and then finding the unique rows in it. It would be memory intensive as we would discard rows from it thereafter.
Another approach would be to find the rows in the first array that are exclusive to it, i.e. not present in the second array and thus just stacking those exclusive rows alongwith the second array. Of course, this would assume that there are unique rows among each input array.
The crux of such a proposed memory-saving implementation would be to get those exclusive rows from first array. For the same we would convert each row into a linear index equivalent considering each row as an indexing tuple on a n-dimensional grid, with the n being the number of columns in the input arrays. Thus, assuming the input arrays as arr1 and arr2, we would have an implementation like so -
# Get dim of ndim-grid on which linear index equivalents are to be mapped
dims = np.maximum(arr1.max(0),arr2.max(0)) + 1
# Get linear index equivalents for arr1, arr2
idx1 = np.ravel_multi_index(arr1.T,dims)
idx2 = np.ravel_multi_index(arr2.T,dims)
# Finally get the exclusive rows and stack with arr2 for desired o/p
out = np.vstack((arr1[~np.in1d(idx1,idx2)],arr2))
Sample run -
In [93]: arr1
Out[93]:
array([[1, 2],
[3, 4],
[5, 3]])
In [94]: arr2
Out[94]:
array([[3, 4],
[5, 6]])
In [95]: out
Out[95]:
array([[1, 2],
[5, 3],
[3, 4],
[5, 6]])
For more info on setting up those linear index equivalents, please refer to this post.

Printing part of list starting from end and "loop" back to beginning not working

>>> mylist = [ 1, 2, 3 , 5, 'd']
>>> print 'mylist[-1:2] = ',mylist[-1:2]
output is an empty list: []
I am testing around with lists and from what I have so far understood from tutorials made me think the output would be [d, 1, 2]
Can anyone please help me out why isn't that the output?
To understand why your code returns an empty list, it is important to note how accessing list elements in python works. Indexing with : on a python list (or slicing) works like this:
a_list[start:stop:step]
stop must always be greater than start for the list indexing/slicing to work.
When you access an element in the list by using a negative number as the index, say a_list[-1], python adds the length of the list to it and gives you what you want. For ex:
a_list = [1, 2, 3, 4, 5]
a_list[-1] == a[5 - 1]
So when you do mylist[-1:2] it actually means mylist[4:2]. This violates start < stop condition. And hence returns an empty list.
This is what you are looking for:
[mylist[-1]] + mylist[:2]
Slicing with negative numbers (see string slicing) does not make the list a ringbuffer. It's just another way of pointing to an element in the list. So, what you ask for is really: The list beginning at index 4 and ending at index 2. Doesn't make sense, hence you get nothing.
Slicing does not work that way in python. Instead, I suggest creating your own defined function, like below:
>>> def loop(list, start, end):
... array = []
... for k in range(start, end+1):
... array.append(list[k])
... return array
...
>>> x = [1, 2, 3, 4, 5]
>>> desired_output = [5, 1, 2, 3]
>>> loop(x, -1, 2)
[5, 1, 2, 3]
>>> loop(x, -1, 2) == desired_output
True
>>>
This loops over the list, and adds each value of the list to another array, which it then returns.
Or, you can add them together using the following code:
[x[-1]] + x[:2]