python 3, comparing elements of two lists of lists - list

I'm trying to compare elements of 2 lists of lists in python. I want to create a new list (ph) which has a 1 if elements of lists from the 1st list of lists are in the elements of the 2nd list of lists.
However, this seems to compare the whole list and not individual elements. The code is below. Many thanks for the help! :)
import numpy as np
import pandas as pd
abc = [[1,800000,3],[4,5,6],[100000,7,8]]
l = [[
[i for i in range(0, 100000)],
[i for i in range(200000,300000)],
[i for i in range(400000,500000)],
[i for i in range(600000,700000)],
[i for i in range(800000,900000)],
[i for i in range(1000000,1100000)]
]]
ph = []
for i in abc:
for j in l:
if l[0] == abc[0]:
ph.append(1)
else:
ph.append(0)
print(ph)

The goal of your problem is somewhat unclear to me. Correct me if I'm wrong but what you want is: for each sublist of abc, get a boolean describing if all its elements are anywhere in l. Is that it ?
If it is indeed the case, here's my answer.
First of all, your second list is not a list of lists but a list of lists of lists. Hence, I removed a nested list in my code.
abc = [[1,800000,3],[4,5,6],[100000,7,8]]
L = [
[i for i in range(0, 100000)],
[i for i in range(200000,300000)],
[i for i in range(400000,500000)],
[i for i in range(600000,700000)],
[i for i in range(800000,900000)],
[i for i in range(1000000,1100000)]
]
flattened_L = sum(L, [])
print(
list(map(lambda sublist: all(x in flattened_L for x in sublist), abc))
)
# returns [True, True, False]
My code first flattens L so that is becomes easy to check whether any element is in it or not. Then, for each sublist in abc, it checks if all elements are in this flattened list.
Note: my code returns a list of boolean. If you absolutely need integers value (0 and 1), which you shouldn't, you can wrap int around all.

Related

Unique elements inside lists of list

If I have a nested list like:
l = [['AB','BCD','TGH'], ['UTY','AB','WEQ'],['XZY','LIY']]
In this example, 'AB' is common to the first two nested lists. How can I remove 'AB' in both lists while keeping the other elements as is? In general how can I remove a element from every nested list that occurs in two or more nested lists so that each nested list is unique?
l = [['BCD','TGH'],['UTY','WEQ'],['XZY','LIY']]
Is it possible to do this with a for loop?
Thanks
from collections import Counter
from itertools import chain
counts = Counter(chain(*ls)) # find counts
result = [[e for e in l if counts[e] == 1] for l in ls] # take uniqs
One option is to do something like this:
from collections import Counter
counts = Counter([b for a in l for b in a])
for a in l:
for b in a:
if counts[b] > 1:
a.remove(b)
Edit: If you want to avoid the (awfully useful standard library) collections module (cf. the comment), you could replace counts above by the following custom counter:
counts = {}
for a in l:
for b in a:
if b in counts:
counts[b] += 1
else:
counts[b] = 1
A somewhat short solution without imports would be to create a reduced version of the original list first, then iterate through the original list and remove elements with counts greater than 1:
lst = lst = [['AB','BCD','TGH'], ['UTY','AB','WEQ'],['XZY','LIY']]
reduced_lst = [y for x in lst for y in x]
output_lst = []
for chunk in lst:
chunk_copy = chunk[:]
for elm in chunk:
if reduced_lst.count(elm)>1:
chunk_copy.remove(elm)
output_lst.append(chunk_copy)
print(output_lst)
Should print:
[['BCD', 'TGH'], ['UTY', 'WEQ'], ['XZY', 'LIY']]
I hope this proves useful.

Python3: how to fastest compute the frequency of words of in a large list, if the word of this list is or not in another large word list

There are two list. One list called lst1=[word1, word2, ......], the length of lst over 40000. Another list called lst2 =[word1, word2,......], the length of lst2 is about 10100. The lst2 is the feature words, I want to get the frequency of lst2 words in the lst1. For example:
lst1 = ['I', 'am', 'foot', 'girl', 'mom', 'fish', 'mom, 'baby']
lst2 = ['mom', 'baby', 'mother'].
So the frequency of lst2 words in lst1: is 'mom': 2, 'baby':1, 'mother':0. My code is following:
def pronoun_feature(lst1, lst2):
dict_p = {}
for item in lst2:
if item in lst1:
num_item = lst1.count(item)
dict_p.update({item: num_item})
else:
dict_p.update({item: 0})

return dict_p
You know the length of my two list is two large, the computing time is taken about 0.02-0.1s. Do you have more faster method to approach my result. Thanks in advance!
Have you tried Counter?
The code looks like this:
from collections import Counter
def pronoun_feature(lst1, lst2):
counts = Counter(lst1)
dict_p = {}
for item in lst2:
dict_p[item] = counts[item]
return dict_p
We won't need if/else or try/except in case of items from lst2 not being in lst1 here, because according to the docs:
Counter objects have a dictionary interface except that they return a
zero count for missing items instead of raising a KeyError

how to find whether this python list within list contains duplicates or not?

I have the following python list :-
a=[['t1', ['a', 'c']], ['t2', ['b']], ['t2', ['b']]]
now it contains duplicate lists within it ['t2', ['b']] 2 times
I want to return true if the list contains duplicates.
Can anyone please help me how to do so ?
I tried to use the set function but it is not working here !
If you are free to represent your list a as a list of tuples/records:
b = [(item[0], tuple(item[1])) for item in a]
(or, in the first place):
a = [('t1', ('a', 'c')), ('t2', ('b')), ('t2', ('b'))]
Then the items become hashable and you can use collections.Counter:
from collections import Counter
c = Counter(b)
So you can find duplicates like:
duplicated_items = [key for key, count in c.iteritems() if count > 1]
(Or you can use set):
has_duplicates = len(set(b)) < len(b)
If your aim is to remove duplicates, this answer might be helpful:
unique_a = [i for n, i in enumerate(a) if i not in a[:n]]
You can rewrite it this way:
has_duplicates = lambda l: True in [(i in l[:n]) for n, i in enumerate(l)]
You can call it this way:
has_duplicates(a) # True

Intersection of two nested lists in Python

I've a problem with the nested lists. I want to compute the lenght of the intersection of two nested lists with the python language. My lists are composed as follows:
list1 = [[1,2], [2,3], [3,4]]
list2 = [[1,2], [6,7], [4,5]]
output_list = [[1,2]]
How can i compute the intersection of the two lists?
I think there are two reasonable approaches to solving this issue.
If you don't have very many items in your top level lists, you can simply check if each sub-list in one of them is present in the other:
intersection = [inner_list for inner in list1 if inner_list in list2]
The in operator will test for equality, so different list objects with the same contents be found as expected. This is not very efficient however, since a list membership test has to iterate over all of the sublists. In other words, its performance is O(len(list1)*len(list2)). If your lists are long however, it may take more time than you want it to.
A more asymptotically efficient alternative approach is to convert the inner lists to tuples and turn the top level lists into sets. You don't actually need to write any loops yourself for this, as map and the set type's & operator will take care of it all for you:
intersection_set = set(map(tuple, list1)) & set(map(tuple, list2))
If you need your result to be a list of lists, you can of course, convert the set of tuples back into a list of lists:
intersection_list = list(map(list, intersection_set))
What about using sets in python?
>>> set1={(1,2),(2,3),(3,4)}
>>> set2={(1,2),(6,7),(4,5)}
>>> set1 & set2
set([(1, 2)])
>>> len(set1 & set2)
1
import json
list1 = [[1,2], [2,3], [3,4]]
list2 = [[1,2], [6,7], [4,5]]
list1_str = map(json.dumps, list1)
list2_str = map(json.dumps, list2)
output_set_str = set(list1_str) & set(list2_str)
output_list = map(json.loads, output_set_str)
print output_list

Python 3.3 functions on pairs in a list

I am trying to create a program that will find the difference between all pairs in a list. For example
[2,4,6]
Would then make a list containing the difference
[2,2]
Is there a way to do this
Itertools Recipes: pairwise
from itertools import tee
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def diffs(iterable):
return [b - a for a, b in pairwise(iterable)]
print(diffs([2,4,6]))
[L[i+1] - L[i] for i in range(len(L)-1)] will do it.
Some other ways also using a list comprehension:
[L[i+1] - L[i] for i in range(len(L[:-1]))]
[L[i] - L[i-1] for i in range(1, len(L[1:]))]
Using map:
list(map(lambda i: L[i+1]-L[i], range(len(L[:-1]))))
list(map(lambda i: L[i]-L[i-1], range(1, len(L[1:]))))
Using map and the operator module:
list(map(operator.sub, L[1:], L[:-1]))
Using zip (this one is probably the nicest way, imo):
[x - y for x, y in zip(L[1:], L[:-1])]
A more verbose approach if you aren't familiar with list comprehensions or with map (GET FAMILIAR!):
def differences(L1,L2):
L = []
for V1,V2 in zip(L1,L2):
L.append(V2-V1)
return L
diffs = differences(L[:-1],L[1:])
And a similar, but much better way to do it using a generator:
def differences(L1,L2):
for V1,V2 in zip(L1,L2):
yield V2-V1
diffs = list(differences(L[:-1],L[1:]))
And here is the generator comprehension equivalent of the above generator(notice it's almost exactly the same as the last list comprehension above, except it uses the list function instead of brackets):
list(V2-V1 for V1,V2 in zip(L[:-1],L[1:]))
Study all of these ways of doing it very closely and you will learn a lot of Python.