Subset a list of tuples by max value in Python - list

My question arise from this discussion. I apologize, but I was not able to add a comment to ask my question under another answer because of my level. I have this list of tuples:
my_list = [('Scaffold100019', 98310), ('Scaffold100019', 14807), ('Scaffold100425', 197577), ('Scaffold100636', 326), ('Scaffold10064', 85415), ('Scaffold10064', 94518)]
I would like to make a dictionary which stores only the max value for each key defined as the first element of the tuple:
my_dict = {'Scaffold100019': 98310, 'Scaffold100425': 197577, 'Scaffold100636': 326, 'Scaffold10064': 94518}
Starting from the Marcus Müller's answer I have:
d = {}
#build a dictionary of lists
for x,y in my_list: d.setdefault(x,[]).append(y)
my_dict = {}
#build a dictionary with the max value only
for item in d: my_dict[item] = max(d[item])
In this way I reach my goal but, is there a sleeker way to complete this task?

I suggest this solution with only one loop, quite readable:
my_dict = {}
for x,y in my_list:
if x in my_dict.keys():
my_dict [x] = max (y, my_dict [x])
else:
my_dict [x] = y

You could use collections.defaultdict.
from collections import defaultdict
d = defaultdict(int)
for key, value in my_list:
d[key] = max(d[key], value)
The above code works on your example data, but will only work in general if each key has a maximum value that is nonnegative. This is because defaultdict(int) returns zero when no value is set, so if all values for a given key are negative, the resulting max will incorrectly be zero.
If purely negative values are possible for a given key, you can make the following alteration:
d = defaultdict(lambda: -float('inf'))
With this alteration, negative infinity will be returned when a key isn't set, so negative values are no longer a concern.

Use the fact that everything is greater than None and the dictionaries get method with None as the fallback return value.
>>> d = {}
>>> for name, value in my_list:
... if value > d.get(name, None):
... d[name] = value
...
>>> d
{'Scaffold100425': 197577, 'Scaffold10064': 94518, 'Scaffold100019': 98310, 'Scaffold100636': 326}
This will work for all values and hashes at most two times per loop.

Related

Indexing a list of dictionaries for a relating value

I have a 4 dictionaries which have been defined into a list
dict1 = {'A':'B'}
dict2 = {'C':'D'}
dict3 = {'E':'F'}
dict4 = {'G':'H'}
list = [dict1, dict2, dict3, dict4]
value = 'D'
print (the relating value to D)
using the list of dictionaries I would like to index it for the relating value of D (which is 'C').
is this possible?
note: the list doesn't have to be used, the program just needs to find the relating value of C by going through the 4 dictionaries in one way or another.
Thanks!
You have a list of dictionaries. A straightforward way would be to loop over the list, and search for desired value using -
dict.iteritems()
which iterates over the dictionary and returns the 'key':'value' pair as a tuple (key,value). So all thats left to do is search for a desired value and return the associated key. Here is a quick code I tried. Also this should work for dictionaries with any number of key value pairs (I hope).
dict1 = {'A':'B'}
dict2 = {'C':'D'}
dict3 = {'E':'F'}
dict4 = {'G':'H'}
list = [dict1, dict2, dict3, dict4]
def find_in_dict(dictx,search_parameter):
for x,y in dictx.iteritems():
if y == search_parameter:
return x
for i in range(list.__len__()):
my_key = find_in_dict(list[i], "D")
print my_key or "No key found"
On a different note, such a usage of dictionaries is little awkward for me, as it defeats the purpose of having a KEY as an index for an associated VALUE. But anyway, its just my opinion and I am not aware of your use case. Hope it helps.

Create dict try comprehension

This:
index ={}
for item in args:
for array in item:
for k,v in json.loads(array).iteritems():
for value in v:
index.setdefault(k,[]).append({'values':value['id']})
Works
But, when I try this:
index ={}
filt = {index.setdefault(k,[]).append(value['id']) for item in args for array in item for (k,v) in json.loads(array).iteritems() for value in v}
print filt
Output:
result set([None])
Whats wrong?
dict.setdefault is an inplace method that returns None so you are creating a set of None's which as sets cannot have duplicates leave you with set([None]):
In [27]: d = {}
In [28]: print(d.setdefault(1,[]).append(1)) # returns None
None
In [35]: d = {}
In [36]: {d.setdefault(k,[]).append(1) for k in range(2)} # a set comprehension
Out[36]: {None}
In [37]: d
Out[37]: {0: [1], 1: [1]}
The index dict like d above would get updated but using any comprehension for side effects is not a good approach. You also cannot replicate the for loops/setdefault logic even using a dict comprehension.
What you could do is use a defaultdict with list.extend:
from collections import defaultdict
index = defaultdict(list)
for item in args:
for array in item:
for k,v in json.loads(array).iteritems():
index[k].extend({'values':value['id']} for value in v)

How do I extract part of a tuple that's duplicate as key to a dictionary, and have the second part of the tuple as value?

I'm pretty new to Python and Qgis, right now I'm just running scripts but I my end-goal is to create a plugin.
Here's the part of the code I'm having problems with:
import math
layer = qgis.utils.iface.activeLayer()
iter = layer.getFeatures()
dict = {}
#iterate over features
for feature in iter:
#print feature.id()
geom = feature.geometry()
coord = geom.asPolyline()
points=geom.asPolyline()
#get Endpoints
first = points[0]
last = points[-1]
#Assemble Features
dict[feature.id() ]= [first, last]
print dict
This is my result :
{0L: [(355277,6.68901e+06), (355385,6.68906e+06)], 1L: [(355238,6.68909e+06), (355340,6.68915e+06)], 2L: [(355340,6.68915e+06), (355452,6.68921e+06)], 3L: [(355340,6.68915e+06), (355364,6.6891e+06)], 4L: [(355364,6.6891e+06), (355385,6.68906e+06)], 5L: [(355261,6.68905e+06), (355364,6.6891e+06)], 6L: [(355364,6.6891e+06), (355481,6.68916e+06)], 7L: [(355385,6.68906e+06), (355501,6.68912e+06)]}
As you can see, many of the lines have a common endpoint:(355385,6.68906e+06) is shared by 7L, 4L and 0L for example.
I would like to create a new dictionary, fetching the shared points as a key, and having the second points as value.
eg : {(355385,6.68906e+06):[(355277,6.68901e+06), (355364,6.6891e+06), (355501,6.68912e+06)]}
I have been looking though list comprehension tutorials, but without much success: most people are looking to delete the duplicates, whereas I would like use them as keys (with unique IDs). Am I correct in thinking set() would still be useful?
I would be very grateful for any help, thanks in advance.
Maybe this is what you need?
dictionary = {}
for i in dict:
for j in dict:
c = set(dict[i]).intersection(set(dict[j]))
if len(c) == 1:
# ok, so now we know, that exactly one tuple exists in both
# sets at the same time, but this one will be the key to new dictionary
# we need the second tuple from the set to become value for this new key
# so we can subtract the key-tuple from set to get the other tuple
d = set(dict[i]).difference(c)
# Now we need to get tuple back from the set
# by doing list(c) we get list
# and our tuple is the first element in the list, thus list(c)[0]
c = list(c)[0]
dictionary[c] = list(d)[0]
else: pass
This code attaches only one tuple to the key in dictionary. If you want multiple values for each key, you can modify it so that each key would have a list of values, this can be done by simply modifying:
# some_value cannot be a set, it can be obtained with c = list(c)[0]
key = some_value
dictionary.setdefault(key, [])
dictionary[key].append(value)
So, the correct answer would be:
dictionary = {}
for i in a:
for j in a:
c = set(a[i]).intersection(set(a[j]))
if len(c) == 1:
d = set(a[i]).difference(c)
c = list(c)[0]
value = list(d)[0]
if c in dictionary and value not in dictionary[c]:
dictionary[c].append(value)
elif c not in dictionary:
dictionary.setdefault(c, [])
dictionary[c].append(value)
else: pass
See this code :
dict={0L: [(355277,6.68901e+06), (355385,6.68906e+06)], 1L: [(355238,6.68909e+06), (355340,6.68915e+06)], 2L: [(355340,6.68915e+06), (355452,6.68921e+06)], 3L: [(355340,6.68915e+06), (355364,6.6891e+06)], 4L: [(355364,6.6891e+06), (355385,6.68906e+06)], 5L: [(355261,6.68905e+06), (355364,6.6891e+06)], 6L: [(355364,6.6891e+06), (355481,6.68916e+06)], 7L: [(355385,6.68906e+06), (355501,6.68912e+06)]}
dictionary = {}
list=[]
for item in dict :
list.append(dict[0])
list.append(dict[1])
b = []
[b.append(x) for c in list for x in c if x not in b]
print b # or set(b)
res={}
for elm in b :
lst=[]
for item in dict :
if dict[item][0] == elm :
lst.append(dict[item][1])
elif dict[item][1] == elm :
lst.append(dict[item][0])
res[elm]=lst
print res

Get dictionary with lowest key value from a list of dictionaries

From a list of dictionaries I would like to get the dictionary with the lowest value for the 'cost' key and then remove the other key,value pairs from that dictionary
lst = [{'probability': '0.44076116', 'cost': '108.41'} , {'probability': '0.55923884', 'cost': '76.56'}]
You can supply a custom key function to the min() built-in function:
>>> min(lst, key=lambda item: float(item['cost']))
{'cost': '76.56', 'probability': '0.55923884'}
Or, if you just need a minimum cost value itself, you can find a minimum cost value from the list of cost values:
costs = [float(item["cost"]) for item in lst]
print(min(costs))
#alecxe's solution is neat and short, +1 for him. here's my way to do it:
>>> dict_to_keep = dict()
>>> min=1000000
>>> for d in lst:
... if float(d["cost"]) < min:
... min = float(d["cost"])
... dict_to_keep = d
...
>>> print (dict_to_keep)
{'cost': '76.56', 'probability': '0.55923884'}

find all ocurrences inside a list

I'm trying to implement a function to find occurrences in a list, here's my code:
def all_numbers():
num_list = []
c.execute("SELECT * FROM myTable")
for row in c:
num_list.append(row[1])
return num_list
def compare_results():
look_up_num = raw_input("Lucky number: ")
occurrences = [i for i, x in enumerate(all_numbers()) if x == look_up_num]
return occurrences
I keep getting an empty list instead of the ocurrences even when I enter a number that is on the mentioned list.
Your code does the following:
It fetches everything from the database. Each row is a sequence.
Then, it takes all these results and adds them to a list.
It returns this list.
Next, your code goes through each item list (remember, its a sequence, like a tuple) and fetches the item and its index (this is what enumerate does).
Next, you attempt to compare the sequence with a string, and if it matches, return it as part of a list.
At #5, the script fails because you are comparing a tuple to a string. Here is a simplified example of what you are doing:
>>> def all_numbers():
... return [(1,5), (2,6)]
...
>>> lucky_number = 5
>>> for i, x in enumerate(all_numbers()):
... print('{} {}'.format(i, x))
... if x == lucky_number:
... print 'Found it!'
...
0 (1, 5)
1 (2, 6)
As you can see, at each loop, your x is the tuple, and it will never equal 5; even though actually the row exists.
You can have the database do your dirty work for you, by returning only the number of rows that match your lucky number:
def get_number_count(lucky_number):
""" Returns the number of times the lucky_number
appears in the database """
c.execute('SELECT COUNT(*) FROM myTable WHERE number_column = %s', (lucky_number,))
result = c.fetchone()
return result[0]
def get_input_number():
""" Get the number to be searched in the database """
lookup_num = raw_input('Lucky number: ')
return get_number_count(lookup_num)
raw_input is returning a string. Try converting it to a number.
occurrences = [i for i, x in enumerate(all_numbers()) if x == int(look_up_num)]