Forming and parsing multidimensional lists in python - list

I have the following text file-
http://www.ncbi.nlm.nih.gov/Class/FieldGuide/BLOSUM62.txt
I need a python code to give me the specific entries of the matrix. I'm using multidimensional lists and would prefer doing it without the numpy library in python. My intent is to form lists within lists where the outer(main) list contains rows of the matrix and the inner list contains the cells of the matrix.
I'm using the following code-
handle=open(fname)
li=[]
matrix=[]
for line in handle:
if not line.startswith('#'):
a=line.split()
for i in a:
li.append(i)
matrix.append(li)
print matrix
However, this just returns a one dimensional list with each element being one cell of the matrix. I'm lost regarding how to fix this. The output should be something of this form-
[['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'B', 'Z', 'X', '*'],
['A', '4', '-1', '-2', '-2', '0', '-1', '-1', '0', '-2', '-1', '-1', '-1', '-1', '-2', '-1', '1', '0', '-3', '-2', '0', '-2', '-1', '0', '-4']]

I think you want to produce a matrix, for example matrix[0][1] refer to a value, right? see following code.
handle=open(fname)
matrix=[]
col={}
idx=0
row={}
idr=0
# get 1st line as column
first_line=0
for line in handle:
if not line.startswith('#'):
if first_line == 0:
first_line=1
# get column header
for i in line.split():
col[i]=idx
idx=idx+1
else:
a = line.split()
x = a.pop(0)
# get row name
row[x]=idr
matrix.append(a)
idr=idr+1
print matrix
print matrix[col['A']][row['A']]
See if this is what you want.

You aren't getting the results you want because you're putting all the values into the same li list. The simplest fix for the issue is simply to move the place you create li into the loop:
handle=open(fname)
matrix=[]
for line in handle:
if not line.startswith('#'):
li=[] # move this line down!
a=line.split()
for i in a:
li.append(i)
matrix.append(li)
print matrix
The inner loop there is a bit silly though. You're adding all the values from one list (a) to another list (li), then throwing away the first list. You should just use the list returned by str.split directly:
handle=open(fname)
matrix=[]
for line in handle:
if not line.startswith('#'):
matrix.append(line.split())
print matrix

Related

How merge dictionary with key values but which contains several different list values?

Someone, asked how my input looks like:
The input is an ouput from preceeding function.
And when I do
print(H1_dict)
The following information is printed to the screen:
defaultdict(<class 'list'>, {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G']})
which means the data type is defaultdict with (keys, values) as (class, list)
So something like this:
H1dict = {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G'].....}
H2dict = {2480: ['C', 'T', 'T'], 2651: ['C', 'C', 'A'].....}
H1_p1_values = {2480: ['0.25', '0.1', '0.083'], 2651: ['0.43', '0.11', '0.23']....}
H1_p2_values = {2480: ['0.15', '0.15', '0.6'], 2651: ['0.26', '0.083', '0.23']....}
H2_p1_values = {2480: ['0.3', '0.19', '0.5'], 2651: ['0.43', '0.17', '0.083']....}
H2_p2_values = {2480: ['0.3', '0.3', '0.1'], 2651: ['0.39', '0.26', '0.21']....}
I want to merge this dictionaries as:
merged_dict (class, list) or (key, values)= {2480: h1['A', 'C', 'C'], h2 ['C', 'T', 'T'], h1_p1['0.25', '0.1', '0.083'], h1_p2['0.15', '0.15', '0.6'], h2_p1['0.3', '0.19', '0.5'], h2_p2['0.3', '0.3', '0.1'], 2651: h1['T', 'A', 'G'], h2['C', 'C', 'A']....}
So, I want to merge several dictionaries using key values but maintain the order in which different dictionary are supplied.
For merging the dictionary I am able to do it partially using:
merged = [haplotype_A, haplotype_B, hapA_freq_My, hapB_freq_My....]
merged_dict = {}
for k in haplotype_A.__iter__():
merged_dict[k] = tuple(merged_dict[k] for merged_dict in merged)
But, I want to add next level of keys infront of each list, so I can access specific items in a large file when needed.
Downstream I want to access the values inside this merged dictionary using keys each time with for-loop. Something like:
for k, v in merged_dict:
h1_p1sum = sum(float(x) for float in v[index] or v[h1_p1])
h1_p1_prod = mul(float(x) for float in v[index] or v[h1_p1])
h1_string = "-".join(str(x) for x in v[h1_index_level]
and the ability to print or write it to the file line by line
print (h1_string)
print (h1_p1_sum)
I am read several examples from defaultdict and other dict but not able to wrap my head around the process. I have been able to do simple operation but something like this seems a little complicated. I would really appreciate any explanation that you may add to the each step of the process.
Thank you in advance !
If I understand you correctly, you want this:
merged = {'h1': haplotype_A, 'h2': haplotype_B, 'h3': hapA_freq_My, ...}
merged_dict = defaultdict(dict)
for var_name in merged:
for k in merged[var_name]:
merged_dict[k][var_name] = merged[var_name][k]
This should give you an output of:
>>>merged_dict
{'2480': {'h1': ['A', 'C', 'C'], 'h2': ['C', 'T', 'T'], ..}, '2651': {...}}
given of course, the variables are the same as your example data given.
You can access them via nested for loops:
for k in merged_dict:
for sub_key in merged_dict[k]:
print(merged_dict[k][sub_key]) # print entire list
for item in merged[k][sub_key]:
print(item) # prints item in list

Why iterative loop to remove items in list stops

New to Python, trying to understand how this iterative loop that is intended to remove all items form the list is handling the indexes in the list and why it stops where it does...
Why does this loop:
foo = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
for i in foo:
foo.remove(i)
print foo
Stop here?
['b', 'd', 'f', 'h']
Instead of here?
['H']
Also, what's happening "under the hood" with the indexes here?
With every iteration, is Python keeping track of which index is next while at the same time, once an item is removed the item to its right moves one index to the left (and that's why it's skipping every other item)?
It starts at index zero, removing the "A" there. It then moves to index one, removing the "D" there. (not "C", because that's at index zero at this point.) Then there are only two items left in the list, so it can't move on to index two, and the loop ends.
Perhaps instead of a for loop, you could use a while loop that continues until the list is empty.
foo = ['A', 'C', 'D', 'E']
while foo:
foo.pop(0)
print foo
... Or you could iterate over a copy of the list, which won't change from underneath you as you modify foo. Of course, this uses a little extra memory.
foo = ['A', 'C', 'D', 'E']
for i in foo[:]:
foo.remove(i)
print foo
To understand why this is happening, let us look step-by-step what is happening internally.
Step 1:
>>> foo = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Here, a new list object is created and is assigned to foo.
Step 2:
>>> for i in foo:
Now, the iteration starts. i loop variable is assigned the value of item at index 0 which is 'a'.
Step 3:
>>> foo.remove(i)
>>> print foo
['b', 'c', 'd', 'e', 'f', 'g', 'h']
Now, .remove(i) performs .remove(foo[0]) and not .remove('a') apparently. The new list now has 'b' at index 0, 'c' at index 1 and so on.
Step 4:
>>> for i in foo:
For the next iteration, i loop variable is assigned the value of item at index 1 which is currently 'c'.
Step 5:
>>> foo.remove(i)
>>> print foo
['b', 'd', 'e', 'f', 'g', 'h']
Now this time, .remove(i) performs .remove(foo[1]) which removes 'c' from the list. The current list now has 'b' at index 0, 'd' at index 1 and so on.
Step 6:
>>> for i in foo:
For the next iteration, i loop variable is assigned the value of item at index 2 which is currently 'e'.
Step 7:
>>> foo.remove(i)
>>> print foo
['b', 'd', 'f', 'g', 'h']
Now this time, .remove(i) performs .remove(foo[2]) which removes 'e' from the list. Similarly, the indices of the items gets changed as in step 5 above.
Step 8:
>>> for i in foo:
For the next iteration, i loop variable is assigned the value of item at index 3 which is currently 'g'.
Step 9:
>>> foo.remove(i)
>>> print foo
['b', 'd', 'f', 'h']
Now this time, .remove(i) performs .remove(foo[3]) which removes 'g' from the list.
Step 10:
>>> for i in foo:
Now, i should point to item at index 4 but since the original list has been reduced to 4 elements, the execution will stop here.
>>> foo
['b', 'd', 'f', 'h']
Above is the final list after execution.
SOME CONCLUSIONS:
NEVER CHANGE THE LENGTH OF LISTS WHILE ITERATING ON THEM. In simple words, don't modify the original list while performing iteration on it.
When performing .remove() in a list iteratively, the loop variable will refer to the list item using indexes and not by the actual items in the original list.

Python: out of range list for random creation of alphabet

I want to create a random alphabet. So my code is the following:
alphabet = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
def new_alphabet():
for i in range(0, 26):
j = 25
my_new_alphabet = [None] * 26
my_new_alphabet[i] = alphabet[random.randint(0, j)]
alphabet.remove(my_new_alphabet[i])
j = j-1
return my_new_alphabet
print new_alphabet()
But when I try to execute it:
my_new_alphabet[i] = alphabet[random.randint(0, j)]
IndexError: list index out of range
It is probably something fairly simple but I cannot manage to find where the problem is. Thanks in advance.
There are some problems with your code.
your alphabet is missing the letter "S", so it has only 25 elements, thus the ranges for your loop and for your random indices are one-off, thus the out-of-range error
the lines j = 25 and my_new_alphabet = [None] * 26 should go before the loop, otherwise you are resetting them in each iteration
Also, you could drop j entirely and just use the bounds of the actual alphabet list instead:
def new_alphabet():
my_new_alphabet = []
while alphabet:
letter = alphabet[random.randint(0, len(alphabet) - 1)]:
# or just use this: letter = random.choice(alphabet)
alphabet.remove(letter)
my_new_alphabet.append(letter)
return my_new_alphabet
Or just use random.shuffle, which does exactly what you want:
def new_alphabet():
my_new_alphabet = list(alphabet) # copy
random.shuffle(my_new_alphabet)
return my_new_alphabet

zipping a list of lists according to index?

How can I use zip to zip a list of lists according to index?
a = [i for i in "four"]
b = [i for i in "help"]
c = [i for i in "stak"]
k = [a,b,c]
print zip(a,b,c)
print zip(k)
zip(a,b,c) prints out [('f', 'h', 's'), ('o', 'e', 't'), ('u', 'l', 'a'), ('r', 'p', 'k')]
This is what I need.
However zip(k) prints out [(['f', 'o', 'u', 'r'],), (['h', 'e', 'l', 'p'],), (['s', 't', 'a', 'k'],)]
This doesn't help at all.
Is there any way to "break up" a list int it's individual pieces for the zip function?
I need the list k, this is a simplified example, k would have an unknown amount of lists for where i'm using it.
Try the following code:
zip(*[lists...])
You can put in any number of lists in there (can easily be generated using list comprehension)

How to generate each possible combination of members from two lists (in Python)

I am a Python newbie and I've been trying to find the way to generate each possible combination of members from two lists:
left = ['a', 'b', 'c', 'd', 'e']
right = ['f', 'g', 'h', 'i', 'j']
The resulting list should be something like:
af ag ah ai aj bf bg bh bi bj cf cg ch ci cj etc...
I made several experiments with loops but I can't get it right:
The zip function but it wasn't useful since it just pairs 1 to 1 members:
for x in zip(left,right):
print x
and looping one list for the other just returns the members of one list repeated as many times as the number of members of the second list :(
Any help will be appreciated. Thanks in advance.
You can use for example list comprehension:
left = ['a', 'b', 'c', 'd', 'e']
right = ['f', 'g', 'h', 'i', 'j']
result = [lc + rc for lc in left for rc in right]
print result
The result will look like:
['af', 'ag', 'ah', 'ai', 'aj', 'bf', 'bg', 'bh', 'bi', 'bj', 'cf', 'cg', 'ch', 'ci', 'cj', 'df', 'dg', 'dh', 'di', 'dj', 'ef', 'eg', 'eh', 'ei', 'ej']