Splitting a list into new lists - list

So I have a list plaintextthat contains ['A', 'A', 'R', 'O', 'N'] and I want to end up with a set of lists called letter1, letter2, letter3, and so on, that contain ['A'], ['A'], ['R'], and so on. How do I go about doing this without cloning the list five times and removing the extra parts?

You can iterate over the list:
In [1]: letters = ['A', 'A', 'R', 'O', 'N']
#use list comprehension to iterate over the list and place each element into a list
In [2]: [[l] for l in letters]
Out[2]: [['A'], ['A'], ['R'], ['O'], ['N']]
To add titles, we typically use a dictionary. For example
#create a dictionary
letters_dict = {}
#iterate over original list as above except now saving to a dictionary
for i in range(len(letters)):
letters_dict['letter'+str(i+1)] = [letters[i]]
This gives you the following:
In [4]: letters_dict
Out[4]:
{'letter1': ['A'],
'letter2': ['A'],
'letter3': ['R'],
'letter4': ['O'],
'letter5': ['N']}
You can now access each of the lists as follows:
In [5]: letters_dict['letters1']
Out[5]: ['A']
Finally, just for completeness, there's a cool extension of the dictionary method. Namely, using code from this thread, you can do the following:
#create a class
class atdict(dict):
__getattr__= dict.__getitem__
__setattr__= dict.__setitem__
__delattr__= dict.__delitem__
#create an instance of the class using our dictionary:
l = atdict(letters_dict)
This way, you can do the following:
In [11]: l.letter1
Out[11]: ['A']
In [12]: l.letter5
Out[12]: ['N']
If you have no desire to store the values in an iterable or referencable object (ie dictionary, list, class) as you suggest in your question, then you could literally do the below:
letter1 = letters[0]
letter2 = letters[1]
letter3 = letters[2]
#and so forth ...
but as you can see, even with 6 variables the above becomes tedious.

Related

Appending a list built using conditional and appended values to a list of lists

I'm currently working with two large csv files of numerical data. One such csv, which we will call X, is composed entirely of numerical data for test subjects. The columns of a are arranged as health measurements like so (id, v1,v2,v3,v4). I am trying to take this information and create a list of lists where each list contains the information for a single person i.e as in this fashion:
X=[['1','a','b','c','d'],
['1','e','f','g','h'],
['2','i','j','k','l'],
['3','m','n','o','p']]
listoflists=[ [['1','a','b','c','d'],['1','e','f','g','h']], #first row
['2','i','j','k','l'], #second
['3','m','n','o','p'] ] #third
(let me know if i should edit the formatting: i wanted to present X as columns for readability. On list of lists I just ran out of room, so listolists = [ a,b,c], where a is the first row, b is the second, and c is third
I've tried something to the effect of this, but my biggest issue is I'm not sure where to create the list of those entities with matching data and then append it to the "master list".
#create a set that holds the values of the subject ids.
ids=list(set([item[0] for item in X]))
#create the list of lists i want
listolists=[]
for value in ids:
listolists.append(sublist)
for i in range(len(X))
sublist=[] #I'm not sure where to create sublists of
#matching data and append to listolists
if value == X[i][0]
sublist.append(X[i]
All help is appreciated. thanks.
Here is something:
X =[
['1','a','b','c','d'],
['1','e','f','g','h'],
['2','i','j','k','l'],
['3','m','n','o','p'],
]
numbers = {x[0] for x in X}
output = []
for num in sorted(numbers):
new_list = [sub_list for sub_list in X if sub_list[0] == num]
output.append(new_list)
print(output)
...
[[['1', 'a', 'b', 'c', 'd'], ['1', 'e', 'f', 'g', 'h']],
[['2', 'i', 'j', 'k', 'l']],
[['3', 'm', 'n', 'o', 'p']]]
If you need to 2nd and third list not nested like the first let me know
EDIT - for exact format specified in your question
X =[
['1','a','b','c','d'],
['1','e','f','g','h'],
['2','i','j','k','l'],
['3','m','n','o','p'],
]
numbers = {x[0] for x in X}
output = []
for num in sorted(numbers):
new_list = [sub_list for sub_list in X if sub_list[0] == num]
if len(new_list) > 1:
output.append(new_list)
else:
output.append((new_list)[0])
print(output)

How merge dictionary with key values but which contains several different list values?

Someone, asked how my input looks like:
The input is an ouput from preceeding function.
And when I do
print(H1_dict)
The following information is printed to the screen:
defaultdict(<class 'list'>, {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G']})
which means the data type is defaultdict with (keys, values) as (class, list)
So something like this:
H1dict = {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G'].....}
H2dict = {2480: ['C', 'T', 'T'], 2651: ['C', 'C', 'A'].....}
H1_p1_values = {2480: ['0.25', '0.1', '0.083'], 2651: ['0.43', '0.11', '0.23']....}
H1_p2_values = {2480: ['0.15', '0.15', '0.6'], 2651: ['0.26', '0.083', '0.23']....}
H2_p1_values = {2480: ['0.3', '0.19', '0.5'], 2651: ['0.43', '0.17', '0.083']....}
H2_p2_values = {2480: ['0.3', '0.3', '0.1'], 2651: ['0.39', '0.26', '0.21']....}
I want to merge this dictionaries as:
merged_dict (class, list) or (key, values)= {2480: h1['A', 'C', 'C'], h2 ['C', 'T', 'T'], h1_p1['0.25', '0.1', '0.083'], h1_p2['0.15', '0.15', '0.6'], h2_p1['0.3', '0.19', '0.5'], h2_p2['0.3', '0.3', '0.1'], 2651: h1['T', 'A', 'G'], h2['C', 'C', 'A']....}
So, I want to merge several dictionaries using key values but maintain the order in which different dictionary are supplied.
For merging the dictionary I am able to do it partially using:
merged = [haplotype_A, haplotype_B, hapA_freq_My, hapB_freq_My....]
merged_dict = {}
for k in haplotype_A.__iter__():
merged_dict[k] = tuple(merged_dict[k] for merged_dict in merged)
But, I want to add next level of keys infront of each list, so I can access specific items in a large file when needed.
Downstream I want to access the values inside this merged dictionary using keys each time with for-loop. Something like:
for k, v in merged_dict:
h1_p1sum = sum(float(x) for float in v[index] or v[h1_p1])
h1_p1_prod = mul(float(x) for float in v[index] or v[h1_p1])
h1_string = "-".join(str(x) for x in v[h1_index_level]
and the ability to print or write it to the file line by line
print (h1_string)
print (h1_p1_sum)
I am read several examples from defaultdict and other dict but not able to wrap my head around the process. I have been able to do simple operation but something like this seems a little complicated. I would really appreciate any explanation that you may add to the each step of the process.
Thank you in advance !
If I understand you correctly, you want this:
merged = {'h1': haplotype_A, 'h2': haplotype_B, 'h3': hapA_freq_My, ...}
merged_dict = defaultdict(dict)
for var_name in merged:
for k in merged[var_name]:
merged_dict[k][var_name] = merged[var_name][k]
This should give you an output of:
>>>merged_dict
{'2480': {'h1': ['A', 'C', 'C'], 'h2': ['C', 'T', 'T'], ..}, '2651': {...}}
given of course, the variables are the same as your example data given.
You can access them via nested for loops:
for k in merged_dict:
for sub_key in merged_dict[k]:
print(merged_dict[k][sub_key]) # print entire list
for item in merged[k][sub_key]:
print(item) # prints item in list

PYTHON 2.7 - Modifying List of Lists and Re-Assembling Without Mutating

I currently have a list of lists that looks like this:
My_List = [[This, Is, A, Sample, Text, Sentence] [This, too, is, a, sample, text] [finally, so, is, this, one]]
Now what I need to do is "tag" each of these words with one of 3, in this case arbitrary, tags such as "EE", "FF", or "GG" based on which list the word is in and then reassemble them into the same order they came in. My final code would need to look like:
GG_List = [This, Sentence]
FF_List = [Is, A, Text]
EE_List = [Sample]
My_List = [[(This, GG), (Is, FF), (A, FF), (Sample, "EE), (Text, FF), (Sentence, GG)] [*same with this sentence*] [*and this one*]]
I tried this by using for loops to turn each item into a dict but the dicts then got rearranged by their tags which sadly can't happen because of the nature of this thing... the experiment needs everything to stay in the same order because eventually I need to measure the proximity of tags relative to others but only in the same sentence (list).
I thought about doing this with NLTK (which I have little experience with) but it looks like that is much more sophisticated then what I need and the tags aren't easily customized by a novice like myself.
I think this could be done by iterating through each of these items, using an if statement as I have to determine what tag they should have, and then making a tuple out of the word and its associated tag so it doesn't shift around within its list.
I've devised this.. but I can't figure out how to rebuild my list-of-lists and keep them in order :(.
for i in My_List: #For each list in the list of lists
for h in i: #For each item in each list
if h in GG_List: # Check for the tag
MyDicts = {"GG":h for h in i} #Make Dict from tag + word
Thank you so much for your help!
Putting the tags in a dictionary would work:
My_List = [['This', 'Is', 'A', 'Sample', 'Text', 'Sentence'],
['This', 'too', 'is', 'a', 'sample', 'text'],
['finally', 'so', 'is', 'this', 'one']]
GG_List = ['This', 'Sentence']
FF_List = ['Is', 'A', 'Text']
EE_List = ['Sample']
zipped = zip((GG_List, FF_List, EE_List), ('GG', 'FF', 'EE'))
tags = {item: tag for tag_list, tag in zipped for item in tag_list}
res = [[(word, tags[word]) for word in entry if word in tags] for entry in My_List]
Now:
>>> res
[[('This', 'GG'),
('Is', 'FF'),
('A', 'FF'),
('Sample', 'EE'),
('Text', 'FF'),
('Sentence', 'GG')],
[('This', 'GG')],
[]]
Dictionary works by key-value pairs. Each key is assigned a value. To search the dictionary, you search the index by the key, e.g.
>>> d = {1:'a', 2:'b', 3:'c'}
>>> d[1]
'a'
In the above case, we always search the dictionary by its keys, i.e. the integers.
In the case that you want to assign the tag/label to each word, you are searching by the key word and finding the "value", i.e. the tag/label, so your dictionary would have to look something like this (assuming that the strings are words and numbers as tag/label):
>>> d = {'a':1, 'b':1, 'c':3}
>>> d['a']
1
>>> sent = 'a b c a b'.split()
>>> sent
['a', 'b', 'c', 'a', 'b']
>>> [d[word] for word in sent]
[1, 1, 3, 1, 1]
This way the order of the tags follows the order of the words when you use a list comprehension to iterate through the words and find the appropriate tags.
So the problem comes when you have the initial dictionary indexed with the wrong way, i.e. key -> labels, value -> words, e.g.:
>>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']}
>>> [d[word] for word in sent]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'a'
Then you would have to reverse your dictionary, assuming that all elements in your value lists are unique, you can do this:
>>> from collections import ChainMap
>>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']}
>>> d_inv = dict(ChainMap(*[{value:key for value in values} for key, values in d.items()]))
>>> d_inv
{'h': 2, 'c': 3, 'a': 1, 'x': 3, 'b': 2, 'd': 1}
But the caveat is that ChainMap is only available in Python3.5 (yet another reason to upgrade your Python ;P). For Python <3.5, solutions, see How do I merge a list of dicts into a single dict?.
So going back to the problem of assigning labels/tags to words, let's say we have these input:
>>> d = {1:['a', 'd'], 2:['b', 'h'], 3:['c', 'x']}
>>> sent = 'a b c a b'.split()
First, we invert the dictionary (assuming that there're one to one mapping for every word and its tag/label:
>>> d_inv = dict(ChainMap(*[{value:key for value in values} for key, values in d.items()]))
Then, we apply the tags to the words through a list comprehension:
>>> [d_inv[word] for word in sent]
[1, 2, 3, 1, 2]
And for multiple sentences:
>>> sentences = ['a b c'.split(), 'h a x'.split()]
>>> [[d_inv[word] for word in sent] for sent in sentences]
[[1, 2, 3], [2, 1, 3]]

Merge elements of inner list with outer list

I have a following list
mylist = ['or', ['or', 'R', ['not', 'B']], 'W']
and wish to remove the double occurrence of 'or' within the list to get the final result as
['or', 'R', ['not', 'B'], 'W']
You could use a recursive function into a class to remove every repeated string at the first level of your list of lists. This would even clean those strings within greater depth of lists.
class FirstLevelRepeated(object):
def __init__(self,my_list):
self.mylist=my_list
def removeRepeated(self):
for obj_index in range(len(self.mylist)):
if type(self.mylist[obj_index]) == list:
self.RemoveRepeated(self.mylist[obj_index])
def RemoveRepeated(self,actual_object):
index_to_remove=[]
for actual_object_index in range(len(actual_object)):
if type(actual_object[actual_object_index])==list:
self.RemoveRepeated(actual_object[actual_object_index])
else:
if actual_object[actual_object_index] in self.mylist:
index_to_remove.append(actual_object_index)
for index in index_to_remove[::-1]:
actual_object.pop(index)
return actual_object
mylist = ['or', ['or', 'R', ['not', 'B']], 'W']
print mylist
list_of_lists=FirstLevelRepeated(mylist)
list_of_lists.removeRepeated()
print list_of_lists.mylist

Dictionary Key Error

I am trying to construct a dictionary with values from a csv file.Say 10 columns there and i want to set the first column as key and the remaining as Values.
If setting as a for loop the dictionary has to have only one value. Kindly Suggest me a way.
import csv
import numpy
aname = {}
#loading the file in numpy
result=numpy.array(list(csv.reader(open('somefile',"rb"),delimiter=','))).astype('string')
#devolop a dict\
r = {aname[rows[0]]: rows[1:] for rows in result}
print r[0]
Error as follows.
r = {aname[rows[0]]: rows[1:] for rows in result}
KeyError: '2a9ac84c-3315-5576-4dfd-8bc34072360d|11937055'
I'm not entirely sure what you mean to do here, but does this help:
>>> result = [[1, 'a', 'b'], [2, 'c', 'd']]
>>> dict([(row[0], row[1:]) for row in result])
{1: ['a', 'b'], 2: ['c', 'd']}