Select duplicated lists from a list of lists (Python 2.7.13) - python-2.7

I have two lists, one is a list of lists, and they have the same number of indexes(the half number of values), like this:
list1=[['47', '43'], ['299', '295'], ['47', '43'], etc.]
list2=[[9.649, 9.612, 9.42, etc.]
I want to detect the repeated pair of values in the same list(and delete it), and sum the values with the same indexes in the second list, creating an output like this:
list1=[['47', '43'], ['299', '295'], etc.]
list2=[[19.069, 9.612, etc.]
The main problem is that the order of the values is important and I'm really stuck.

You could create a collections.defaultdict to sum values together, with keys as the sublists (converted as tuple to be hashable)
list1=[['47', '43'], ['299', '295'], ['47', '43']]
list2=[9.649, 9.612, 9.42]
import collections
c = collections.defaultdict(float)
for l,v in zip(list1,list2):
c[tuple(l)] += v
print(c)
Alternative using collections.Counter and which does the same:
c = collections.Counter((tuple(k),v) for k,v in zip(list1,list2))
At this point, we have the related data:
defaultdict(<class 'float'>, {('299', '295'): 9.612, ('47', '43'): 19.069})
now if needed (not sure, since the dictionary holds the data very well) we can rebuild the lists, keeping the (relative) order between them (but not their original order, that shouldn't be a problem since they're still linked):
list1=[]
list2=[]
for k,v in c.items():
list1.append(list(k))
list2.append(v)
print(list1,list2)
result:
[['299', '295'], ['47', '43']]
[9.612, 19.069]

Related

Is there an algorithm/way to find out how different (or the minimum distance between) 2 list orders?

I have a bunch of items I want to rate in a specific order. For example:
["Person1", "Person2", "Person3", "Person4", "Person5"]
Which can be ordered like this:
["Person4", "Person5", "Person3", "Person1", "Person2"]
Given 2 different orders of the same list, is there a way to quantify how difference they are?
I know Levenshtein distance exists for strings, and I'm looking for something similar.
My ideal measurement for distance would be the minimum number of switches between two adjacent items required to change one list to the other - but I'm open to other algorithms if you think they're better.
The answer I'm looking for is an algorithm (and preferably, a [Python] implementation) to perform this kind of measurement (fast).
Thanks in advance!
To quantify how "different" two strings are, as you already noted, you can use Levenshtein distance, which is implemented in this library:
pip install levenshtein
>>> import Levenshtein
>>> Levenshtein.distance("lewenstein", "levenshtein")
2
To determine how "different" two lists are, you could assign each value in the list to a Unicode character.
import Levenshtein
def list_distance(A, B):
# Assign each unique value of the list to a unicode character
unique_map = {v:chr(k) for (k,v) in enumerate(set(A+B))}
# Create string versions of the lists
a = ''.join(list(map(unique_map.get, A)))
b = ''.join(list(map(unique_map.get, B)))
return Levenshtein.distance(a, b)
A = ["Person1", "Person2", "Person3", "Person4", "Person5"]
B = ["Person4", "Person5", "Person3", "Person1", "Person2"]
list_distance(A, B)
returns 4.
This works by making a unique mapping to arbitrary Unicode characters, for example:
the list A to the string '\x03\x02\x01\x00\x04' and
the list B to the string '\x00\x04\x01\x03\x02',
before taking the Levenshtein distance of the two strings.

How to combine two lists together that are being formed while iterating another list?

I'm using scrapy to iteratively scrape some data, and the data is being output as two lists through each iteration. I want to combine the two lists into one list at each iteration, so that in the end I will have one big list with many sublists(each sublist being the combination of the two lists created from each iteration)
That may be confusing so I will show my current output and code:
using Scrapy I"m iterating in the following way,
for i in response.css(''tr.insider....."):
i.css(a.tab-link:text).extract() #creating the first list
i.css('td::text').extract() #creating the second list
So the current output is something like this
[A,B,C] #first iteration
[1,2,3]
[D,E,F] #second iteration
[4,5,6]
[G,H,I] #third iteration
[7,8,9]
Desired output is
[[A,B,C,1,2,3], [D,E,F,4,5,6],[G,H,I,7,8,9]]
I tried the following code but I'm getting a list of None.
x =[]
for i in response.css(''tr.insider....."):
x.append(i.css(a.tablink::text).extract().extend(i.css('td::text').extract()))
But the return is just
None
None
None
None
None.....
Thanks!
extend function returns None, so you always append None to x.
For your purpose, I this is what you want:
for i in response.css(''tr.insider....."):
i.css('a.tab-link:text, td::text').extract()
You can simply add two lists together and append them to your results list.
results = []
for i in response.css("tr.insider....."):
first = i.css(a.tab-link:text).extract()
second = i.css('td::text').extract()
# combine both and append to results
results.append(first + second)
print(results)
# e.g.: [[A,B,C,1,2,3], [D,E,F,4,5,6],[G,H,I,7,8,9]]

How to find the union of multiple lists of sub-lists

i have 6 different lists of list similar to
list1=[['hello',1,2,'b3'],['world',1,2,'b4']]
list2=[['yo',4,5,'ba'],['lolz',1,4.35,'b4']]
list3=[['yo',4,5,'ba'],['world',3,4.35,'b6']]
list4=[['test',4,5,'b6'],['test',4,5,'b6']]
they can have around 100 sub-lists in each list but they always have the 4 entries in the sub-list. I want to find all the different sub-list that are all the same and put them into a final list. so it would look something like
final=[['yo',4,5,'ba'],['test',4,5,'b6']]
The pattern is important so the entries in the sub-lists will need to stay in order but the order of the sub-list doesn't matter. what is the best way i could do this? Thank you for your help.
Assuming that there are no unhashable elements of the sublists, I would convert them to tuples and feed them to collections.Counter
from collections import Counter
big_list = [list1, list2, ...]
c = Counter(tuple(sublist) for l in big_list for sublist in l)
final = [list(i) for i in c if c[i] > 1]

How to read each element within a tuple from a list

I want to write a program which will read in a list of tuples, and in the tuple it will contain two elements. The first element can be an Object, and the second element will be the quantity of that Object. Just like: Mylist([{Object1,Numbers},{Object2, Numbers}]).
Then I want to read in the Numbers and print the related Object Numbers times and then store them in a list.
So if Mylist([{lol, 3},{lmao, 2}]), then I should get [lol, lol, lol, lmao, lmao] as the final result.
My thought is to first unzip those tuples (imagine if there are more than 2) into two tuples which the first one contains the Objects while the second one contains the quantity numbers.
After that read the numbers in second tuples and then print the related Object in first tuple with the exact times. But I don't know how to do this. THanks for any help!
A list comprehension can do that:
lists:flatten([lists:duplicate(N,A) || {A, N} <- L]).
If you really want printing too, use recursion:
p([]) -> [];
p([{A,N}|T]) ->
FmtString = string:join(lists:duplicate(N,"~p"), " ")++"\n",
D = lists:duplicate(N,A),
io:format(FmtString, D),
D++p(T).
This code creates a format string for io:format/2 using lists:duplicate/2 to replicate the "~p" format specifier N times, joins them with a space with string:join/2, and adds a newline. It then uses lists:duplicate/2 again to get a list of N copies of A, prints those N items using the format string, and then combines the list with the result of a recursive call to create the function result.

Comapring dictionary with list type values

I have the following 2 dictionaries,
d1={"aa":[1,2,3],"bb":[4,5,6],"cc":[7,8,9]}
d2={"aa":[1,2,3],"bb":[1,1,1,1,1,1],"cc":[7,8]}
How could I compare these two dictionaries and get the
positions(indexes) of UNMATCHED key value pairs? since I am dealing
with files of size around 2 GB, the dictionaries contain very large
data. How can this be implemented in optimized way?
def getUniqueEntry(dictionary1, dictionary2, listOfKeys):
assert sorted(dictionary1.keys()) == sorted(dictionary2.keys()), "Keys don't match" #check that they have the same keys
for key in dictionary1:
if dictionary1[key] != dictionary2[key]:
listOfKeys.append(key)
When calling the function, the third param listOfKeys is an empty list where you want the keys to be stored. Note that reading 2 gb worth of data into a dict requires alot of ram and will most likely fail.
and this is a more pythonic way: The list expansion will consider just the values that are not equal in both dictionaries:
diffrent_keys = [key for key in d1 if d1[key] != d2[key] ]