Dictionary w nested dicts to list in specified order - list

Sorry for the post if it seems redundant. I've looked through a bunch of other posts and I can't seem to find what i'm looking for - perhaps bc I'm a python newby trying to write basic code...
Given a dictionary of any size: Some keys have a single value, others have a nested dictionary as its value.
I would like to convert the dictionary into a list (including the nested values as list items) but in a specific order.
for example:
d = {'E':{'e3': 'Zzz', 'e1':'Xxx', 'e2':'Yyy'}, 'D': {'d3': 'Vvv', 'd1':'Nnn', 'd2':'Kkk'}, 'U': 'Bbb'}
and I would like it to look like this:
order_list = ['U', 'D', 'E'] # given this order...
final_L = ['U', 'Bbb', 'D', 'd1', 'Nnn', 'd2', 'Kkk', 'd3', 'Vvv', 'E', 'e1', 'Xxx', 'e2', 'Yyy', 'e3', 'Zzz']
I can make the main keys fall into order but the the nested values. Here's what i have so far...
d = {'E':{'e3': 'Zzz', 'e1':'Xxx', 'e2':'Yyy'}, 'D': {'d3': 'Vvv', 'd1':'Nnn', 'd2':'Kkk'}, 'U': 'Bbb'}
order_list = ['U', 'D', 'E']
temp_list = []
for x in order_list:
for key,value in d.items():
if key == x:
temp_list.append([key,value])
final_L = [item for sublist in temp_list for item in sublist]
print(final_L)
My current output is:
['U', 'Bbb', 'D', {'d1': 'Nnn', 'd2': 'Kkk', 'd3': 'Vvv'}, 'E', {'e1': 'Xxx', 'e3': 'Zzz', 'e2': 'Yyy'}]

So there a couple of easy transformation to make with a list comprehension:
>>> [(k, sorted(d[k].items()) if isinstance(d[k], dict) else d[k]) for k in 'UDE']
[('U', 'Bbb'),
('D', [('d1', 'Nnn'), ('d2', 'Kkk'), ('d3', 'Vvv')]),
('E', [('e1', 'Xxx'), ('e2', 'Yyy'), ('e3', 'Zzz')])]
Now you just need to flatten an arbitrary depth list, here's a post describing how to do that:
import collections
def flatten(l):
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, str):
yield from flatten(e)
else:
yield el
>>> list(flatten((k, sorted(d[k].items()) if isinstance(d[k], dict) else d[k]) for k in 'UDE'))
['U', 'Bbb', 'D', 'd1', 'Nnn', 'd2', 'Kkk', 'd3', 'Vvv', 'E', 'e1', 'Xxx', 'e2', 'Yyy', 'e3', 'Zzz']

Related

Transform list to the dictionary

I have a list with a strings
['scene-task-v001-user', 'scene-task-v002-user', 'scene-explo-v001-user', 'scene-train-v001-user', 'scene-train-v002-user']
strings created by regular expression
'(?P<scene>\w+)-(?P<task>\w+)-v(?P<ver>\d{3,})-(?P<user>\w+)'
I need to create dictionary where key its a task group and values contain all ver groups with the same task
{'task': ['001', '002'], 'explo': ['001'], 'train': ['001', '002']}
How to do it?
Thanks!
First of all, ('t-1', 't-2', 's-1', 'z-1', 'z-2') is a tuple, not a list. In addition, {'t': {'1', '2'}, 's': {'1'}, 'z': {'1', '2'}} is wrong expression, a form of the values would be a list here, not {}. I corrected this issue in my codes below.
Instead of using regular expression, you can loop the list and split by '-' inside the loop to get keys and values, as follows:
from collections import defaultdict
l = ('t-1', 't-2', 's-1', 'z-1', 'z-2')
d = defaultdict(list)
for item in l:
key, val = item.split('-')
d[key].append(val)
print(d) # defaultdict(<class 'list'>, {'t': ['1', '2'], 's': ['1'], 'z': ['1', '2']})
print(d['t']) # ['1', '2']
Using regular expressions to get keys and values for a dictionary:
from collections import defaultdict
import re
l = ('t-1', 't-2', 's-1', 'z-1', 'z-2')
d = defaultdict(list)
for item in l:
key_patten = re.compile('\w-')
val_patten = re.compile('-\w')
key = key_patten.search(item).group().replace('-', '')
val = val_patten.search(item).group().replace('-', '')
d[key].append(val)
print(d) # defaultdict(<class 'list'>, {'t': ['1', '2'], 's': ['1'], 'z': ['1', '2']})
print(d['t']) # ['1', '2']

Filter and limit on a python dictionary

Given:
obj = {}
obj['a'] = ['x', 'y', 'z']
obj['b'] = ['x', 'y', 'z', 'u', 't']
obj['c'] = ['x']
obj['d'] = ['y', 'u']
How do you select (e.g. print) the top 2 entries in this dictionary, sorted by the length of each list?
the top 2 entries in this dictionary, sorted by the length of each
list
print(sorted(obj.values(), key=len)[:2])
The output:
[['x'], ['y', 'u']]

How merge dictionary with key values but which contains several different list values?

Someone, asked how my input looks like:
The input is an ouput from preceeding function.
And when I do
print(H1_dict)
The following information is printed to the screen:
defaultdict(<class 'list'>, {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G']})
which means the data type is defaultdict with (keys, values) as (class, list)
So something like this:
H1dict = {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G'].....}
H2dict = {2480: ['C', 'T', 'T'], 2651: ['C', 'C', 'A'].....}
H1_p1_values = {2480: ['0.25', '0.1', '0.083'], 2651: ['0.43', '0.11', '0.23']....}
H1_p2_values = {2480: ['0.15', '0.15', '0.6'], 2651: ['0.26', '0.083', '0.23']....}
H2_p1_values = {2480: ['0.3', '0.19', '0.5'], 2651: ['0.43', '0.17', '0.083']....}
H2_p2_values = {2480: ['0.3', '0.3', '0.1'], 2651: ['0.39', '0.26', '0.21']....}
I want to merge this dictionaries as:
merged_dict (class, list) or (key, values)= {2480: h1['A', 'C', 'C'], h2 ['C', 'T', 'T'], h1_p1['0.25', '0.1', '0.083'], h1_p2['0.15', '0.15', '0.6'], h2_p1['0.3', '0.19', '0.5'], h2_p2['0.3', '0.3', '0.1'], 2651: h1['T', 'A', 'G'], h2['C', 'C', 'A']....}
So, I want to merge several dictionaries using key values but maintain the order in which different dictionary are supplied.
For merging the dictionary I am able to do it partially using:
merged = [haplotype_A, haplotype_B, hapA_freq_My, hapB_freq_My....]
merged_dict = {}
for k in haplotype_A.__iter__():
merged_dict[k] = tuple(merged_dict[k] for merged_dict in merged)
But, I want to add next level of keys infront of each list, so I can access specific items in a large file when needed.
Downstream I want to access the values inside this merged dictionary using keys each time with for-loop. Something like:
for k, v in merged_dict:
h1_p1sum = sum(float(x) for float in v[index] or v[h1_p1])
h1_p1_prod = mul(float(x) for float in v[index] or v[h1_p1])
h1_string = "-".join(str(x) for x in v[h1_index_level]
and the ability to print or write it to the file line by line
print (h1_string)
print (h1_p1_sum)
I am read several examples from defaultdict and other dict but not able to wrap my head around the process. I have been able to do simple operation but something like this seems a little complicated. I would really appreciate any explanation that you may add to the each step of the process.
Thank you in advance !
If I understand you correctly, you want this:
merged = {'h1': haplotype_A, 'h2': haplotype_B, 'h3': hapA_freq_My, ...}
merged_dict = defaultdict(dict)
for var_name in merged:
for k in merged[var_name]:
merged_dict[k][var_name] = merged[var_name][k]
This should give you an output of:
>>>merged_dict
{'2480': {'h1': ['A', 'C', 'C'], 'h2': ['C', 'T', 'T'], ..}, '2651': {...}}
given of course, the variables are the same as your example data given.
You can access them via nested for loops:
for k in merged_dict:
for sub_key in merged_dict[k]:
print(merged_dict[k][sub_key]) # print entire list
for item in merged[k][sub_key]:
print(item) # prints item in list

comparing two lists of unequal length at each index

I have two lists of unequal length such as
list1 = ['G','T','C','A','G']
list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']
I want to compare these two lists at each index only against the corresponding positions i.e list2[0] against list1[0] and list2[1] against list1[1] and so on upto the length of list1.
And get two new lists one having the mismatches and the second having the position of mismatches for example in the language of coding it can be stated as :
if 'G' == 'GGG' or 'G' # where 'G' is from list1[1] and 'GGG' is from list2[2]
elif 'G' == 'AAA'
{
outlist1 == list1[index] # postion of mismatch
outlist2 == 'G/A'
}
ok this works. There are definitely ways to do it in less code, but I think this is pretty clear:
#Function to process the lists
def get_mismatches(list1,list2):
#Prepare the output lists
mismatch_list = []
mismatch_pos = []
#Figure out which list is smaller
smaller_list_len = min(len(list1),len(list2))
#Loop through the lists checking element by element
for ind in range(smaller_list_len):
elem1 = list1[ind][0] #First char of string 1, such as 'G'
elem2 = list2[ind][0] #First char of string 2, such as 'A'
#If they match just continue
if elem1 == elem2:
continue
#If they don't match update the output lists
else:
mismatch_pos.append(ind)
mismatch_list.append(elem1+'/'+elem2)
#Return the output lists
return mismatch_list,mismatch_pos
#Make input lists
list1 = ['G','T','C','A','G']
list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']
#Call the function to get the output lists
outlist1,outlist2 = get_mismatches(list1,list2)
#Print the output lists:
print outlist1
print outlist2
Output:
['G/A', 'C/G', 'A/C']
[0, 2, 3]
And just to see how short I could get the code I made this function which I think is equivalent:
def short_get_mismatches(l1,l2):
o1,o2 = zip(*[(i,x[0]+'/'+y[0]) for i,(x,y) in enumerate(zip(l1,l2)) if x[0] != y[0]])
return list(o1),list(o2)
#Make input lists
list1 = ['G','T','C','A','G']
list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']
#Call the function to get the output lists
outlist1,outlist2 = short_get_mismatches(list1,list2)
EDIT:
I'm not sure if I'm cleaning the sequence as you want w/ the N's and -'s. Is this the answer to the example in your comment?
Unclean list1 ['A', 'T', 'G', 'C', 'A', 'C', 'G', 'T', 'C', 'G']
Clean list1 ['A', 'T', 'G', 'C', 'A', 'C', 'G', 'T', 'C', 'G']
Unclean list2 ['GGG', 'TTTN', '-', 'NNN', 'AAA', 'CCC', 'GCCC', 'TTT', 'CCCTN']
Clean list2 ['GGG', 'TTT', 'AAA', 'CCC', 'GCCC', 'TTT', 'CCCT']
0 A GGG
1 T TTT
2 G AAA
3 C CCC
4 A GCCC
5 C TTT
6 G CCCT
['A/G', 'G/A', 'A/G', 'C/T', 'G/C']
[0, 2, 4, 5, 6]
this works fine for my question:
#!/usr/bin/env python
list1=['A', 'T', 'G', 'C', 'A' ,'C', 'G' , 'T' , 'C', 'G']
list2=[ 'GGG' , 'TTTN' , ' - ' , 'NNN' , 'AAA' , 'CCC' , 'GCCC' , 'TTT' ,'CCCATN' ]
notifications = []
indexes = []
for i in range(min(len(list1), len(list2))):
item1 = list1[i]
item2 = list2[i]
# Skip ' - '
if item2 == ' - ':
continue
# Remove N since it's a wildcard
item2 = item2.replace('N', '')
# Remove item1
item2 = item2.replace(item1, '')
chars = set(item2)
# All matched
if len(chars) == 0:
continue
notifications.append('{}/{}'.format(item1, '/'.join(set(item2))))
indexes.append(i)
print(notifications)
print(indexes)
It gives the output as
['A/G', 'G/C', 'C/A/T']
[0, 6, 8]

Splitting a list into new lists

So I have a list plaintextthat contains ['A', 'A', 'R', 'O', 'N'] and I want to end up with a set of lists called letter1, letter2, letter3, and so on, that contain ['A'], ['A'], ['R'], and so on. How do I go about doing this without cloning the list five times and removing the extra parts?
You can iterate over the list:
In [1]: letters = ['A', 'A', 'R', 'O', 'N']
#use list comprehension to iterate over the list and place each element into a list
In [2]: [[l] for l in letters]
Out[2]: [['A'], ['A'], ['R'], ['O'], ['N']]
To add titles, we typically use a dictionary. For example
#create a dictionary
letters_dict = {}
#iterate over original list as above except now saving to a dictionary
for i in range(len(letters)):
letters_dict['letter'+str(i+1)] = [letters[i]]
This gives you the following:
In [4]: letters_dict
Out[4]:
{'letter1': ['A'],
'letter2': ['A'],
'letter3': ['R'],
'letter4': ['O'],
'letter5': ['N']}
You can now access each of the lists as follows:
In [5]: letters_dict['letters1']
Out[5]: ['A']
Finally, just for completeness, there's a cool extension of the dictionary method. Namely, using code from this thread, you can do the following:
#create a class
class atdict(dict):
__getattr__= dict.__getitem__
__setattr__= dict.__setitem__
__delattr__= dict.__delitem__
#create an instance of the class using our dictionary:
l = atdict(letters_dict)
This way, you can do the following:
In [11]: l.letter1
Out[11]: ['A']
In [12]: l.letter5
Out[12]: ['N']
If you have no desire to store the values in an iterable or referencable object (ie dictionary, list, class) as you suggest in your question, then you could literally do the below:
letter1 = letters[0]
letter2 = letters[1]
letter3 = letters[2]
#and so forth ...
but as you can see, even with 6 variables the above becomes tedious.