Merge elements of inner list with outer list - python-2.7

I have a following list
mylist = ['or', ['or', 'R', ['not', 'B']], 'W']
and wish to remove the double occurrence of 'or' within the list to get the final result as
['or', 'R', ['not', 'B'], 'W']

You could use a recursive function into a class to remove every repeated string at the first level of your list of lists. This would even clean those strings within greater depth of lists.
class FirstLevelRepeated(object):
def __init__(self,my_list):
self.mylist=my_list
def removeRepeated(self):
for obj_index in range(len(self.mylist)):
if type(self.mylist[obj_index]) == list:
self.RemoveRepeated(self.mylist[obj_index])
def RemoveRepeated(self,actual_object):
index_to_remove=[]
for actual_object_index in range(len(actual_object)):
if type(actual_object[actual_object_index])==list:
self.RemoveRepeated(actual_object[actual_object_index])
else:
if actual_object[actual_object_index] in self.mylist:
index_to_remove.append(actual_object_index)
for index in index_to_remove[::-1]:
actual_object.pop(index)
return actual_object
mylist = ['or', ['or', 'R', ['not', 'B']], 'W']
print mylist
list_of_lists=FirstLevelRepeated(mylist)
list_of_lists.removeRepeated()
print list_of_lists.mylist

Related

Appending a list built using conditional and appended values to a list of lists

I'm currently working with two large csv files of numerical data. One such csv, which we will call X, is composed entirely of numerical data for test subjects. The columns of a are arranged as health measurements like so (id, v1,v2,v3,v4). I am trying to take this information and create a list of lists where each list contains the information for a single person i.e as in this fashion:
X=[['1','a','b','c','d'],
['1','e','f','g','h'],
['2','i','j','k','l'],
['3','m','n','o','p']]
listoflists=[ [['1','a','b','c','d'],['1','e','f','g','h']], #first row
['2','i','j','k','l'], #second
['3','m','n','o','p'] ] #third
(let me know if i should edit the formatting: i wanted to present X as columns for readability. On list of lists I just ran out of room, so listolists = [ a,b,c], where a is the first row, b is the second, and c is third
I've tried something to the effect of this, but my biggest issue is I'm not sure where to create the list of those entities with matching data and then append it to the "master list".
#create a set that holds the values of the subject ids.
ids=list(set([item[0] for item in X]))
#create the list of lists i want
listolists=[]
for value in ids:
listolists.append(sublist)
for i in range(len(X))
sublist=[] #I'm not sure where to create sublists of
#matching data and append to listolists
if value == X[i][0]
sublist.append(X[i]
All help is appreciated. thanks.
Here is something:
X =[
['1','a','b','c','d'],
['1','e','f','g','h'],
['2','i','j','k','l'],
['3','m','n','o','p'],
]
numbers = {x[0] for x in X}
output = []
for num in sorted(numbers):
new_list = [sub_list for sub_list in X if sub_list[0] == num]
output.append(new_list)
print(output)
...
[[['1', 'a', 'b', 'c', 'd'], ['1', 'e', 'f', 'g', 'h']],
[['2', 'i', 'j', 'k', 'l']],
[['3', 'm', 'n', 'o', 'p']]]
If you need to 2nd and third list not nested like the first let me know
EDIT - for exact format specified in your question
X =[
['1','a','b','c','d'],
['1','e','f','g','h'],
['2','i','j','k','l'],
['3','m','n','o','p'],
]
numbers = {x[0] for x in X}
output = []
for num in sorted(numbers):
new_list = [sub_list for sub_list in X if sub_list[0] == num]
if len(new_list) > 1:
output.append(new_list)
else:
output.append((new_list)[0])
print(output)

Dictionary w nested dicts to list in specified order

Sorry for the post if it seems redundant. I've looked through a bunch of other posts and I can't seem to find what i'm looking for - perhaps bc I'm a python newby trying to write basic code...
Given a dictionary of any size: Some keys have a single value, others have a nested dictionary as its value.
I would like to convert the dictionary into a list (including the nested values as list items) but in a specific order.
for example:
d = {'E':{'e3': 'Zzz', 'e1':'Xxx', 'e2':'Yyy'}, 'D': {'d3': 'Vvv', 'd1':'Nnn', 'd2':'Kkk'}, 'U': 'Bbb'}
and I would like it to look like this:
order_list = ['U', 'D', 'E'] # given this order...
final_L = ['U', 'Bbb', 'D', 'd1', 'Nnn', 'd2', 'Kkk', 'd3', 'Vvv', 'E', 'e1', 'Xxx', 'e2', 'Yyy', 'e3', 'Zzz']
I can make the main keys fall into order but the the nested values. Here's what i have so far...
d = {'E':{'e3': 'Zzz', 'e1':'Xxx', 'e2':'Yyy'}, 'D': {'d3': 'Vvv', 'd1':'Nnn', 'd2':'Kkk'}, 'U': 'Bbb'}
order_list = ['U', 'D', 'E']
temp_list = []
for x in order_list:
for key,value in d.items():
if key == x:
temp_list.append([key,value])
final_L = [item for sublist in temp_list for item in sublist]
print(final_L)
My current output is:
['U', 'Bbb', 'D', {'d1': 'Nnn', 'd2': 'Kkk', 'd3': 'Vvv'}, 'E', {'e1': 'Xxx', 'e3': 'Zzz', 'e2': 'Yyy'}]
So there a couple of easy transformation to make with a list comprehension:
>>> [(k, sorted(d[k].items()) if isinstance(d[k], dict) else d[k]) for k in 'UDE']
[('U', 'Bbb'),
('D', [('d1', 'Nnn'), ('d2', 'Kkk'), ('d3', 'Vvv')]),
('E', [('e1', 'Xxx'), ('e2', 'Yyy'), ('e3', 'Zzz')])]
Now you just need to flatten an arbitrary depth list, here's a post describing how to do that:
import collections
def flatten(l):
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, str):
yield from flatten(e)
else:
yield el
>>> list(flatten((k, sorted(d[k].items()) if isinstance(d[k], dict) else d[k]) for k in 'UDE'))
['U', 'Bbb', 'D', 'd1', 'Nnn', 'd2', 'Kkk', 'd3', 'Vvv', 'E', 'e1', 'Xxx', 'e2', 'Yyy', 'e3', 'Zzz']

How merge dictionary with key values but which contains several different list values?

Someone, asked how my input looks like:
The input is an ouput from preceeding function.
And when I do
print(H1_dict)
The following information is printed to the screen:
defaultdict(<class 'list'>, {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G']})
which means the data type is defaultdict with (keys, values) as (class, list)
So something like this:
H1dict = {2480: ['A', 'C', 'C'], 2651: ['T', 'A', 'G'].....}
H2dict = {2480: ['C', 'T', 'T'], 2651: ['C', 'C', 'A'].....}
H1_p1_values = {2480: ['0.25', '0.1', '0.083'], 2651: ['0.43', '0.11', '0.23']....}
H1_p2_values = {2480: ['0.15', '0.15', '0.6'], 2651: ['0.26', '0.083', '0.23']....}
H2_p1_values = {2480: ['0.3', '0.19', '0.5'], 2651: ['0.43', '0.17', '0.083']....}
H2_p2_values = {2480: ['0.3', '0.3', '0.1'], 2651: ['0.39', '0.26', '0.21']....}
I want to merge this dictionaries as:
merged_dict (class, list) or (key, values)= {2480: h1['A', 'C', 'C'], h2 ['C', 'T', 'T'], h1_p1['0.25', '0.1', '0.083'], h1_p2['0.15', '0.15', '0.6'], h2_p1['0.3', '0.19', '0.5'], h2_p2['0.3', '0.3', '0.1'], 2651: h1['T', 'A', 'G'], h2['C', 'C', 'A']....}
So, I want to merge several dictionaries using key values but maintain the order in which different dictionary are supplied.
For merging the dictionary I am able to do it partially using:
merged = [haplotype_A, haplotype_B, hapA_freq_My, hapB_freq_My....]
merged_dict = {}
for k in haplotype_A.__iter__():
merged_dict[k] = tuple(merged_dict[k] for merged_dict in merged)
But, I want to add next level of keys infront of each list, so I can access specific items in a large file when needed.
Downstream I want to access the values inside this merged dictionary using keys each time with for-loop. Something like:
for k, v in merged_dict:
h1_p1sum = sum(float(x) for float in v[index] or v[h1_p1])
h1_p1_prod = mul(float(x) for float in v[index] or v[h1_p1])
h1_string = "-".join(str(x) for x in v[h1_index_level]
and the ability to print or write it to the file line by line
print (h1_string)
print (h1_p1_sum)
I am read several examples from defaultdict and other dict but not able to wrap my head around the process. I have been able to do simple operation but something like this seems a little complicated. I would really appreciate any explanation that you may add to the each step of the process.
Thank you in advance !
If I understand you correctly, you want this:
merged = {'h1': haplotype_A, 'h2': haplotype_B, 'h3': hapA_freq_My, ...}
merged_dict = defaultdict(dict)
for var_name in merged:
for k in merged[var_name]:
merged_dict[k][var_name] = merged[var_name][k]
This should give you an output of:
>>>merged_dict
{'2480': {'h1': ['A', 'C', 'C'], 'h2': ['C', 'T', 'T'], ..}, '2651': {...}}
given of course, the variables are the same as your example data given.
You can access them via nested for loops:
for k in merged_dict:
for sub_key in merged_dict[k]:
print(merged_dict[k][sub_key]) # print entire list
for item in merged[k][sub_key]:
print(item) # prints item in list

Splitting a list into new lists

So I have a list plaintextthat contains ['A', 'A', 'R', 'O', 'N'] and I want to end up with a set of lists called letter1, letter2, letter3, and so on, that contain ['A'], ['A'], ['R'], and so on. How do I go about doing this without cloning the list five times and removing the extra parts?
You can iterate over the list:
In [1]: letters = ['A', 'A', 'R', 'O', 'N']
#use list comprehension to iterate over the list and place each element into a list
In [2]: [[l] for l in letters]
Out[2]: [['A'], ['A'], ['R'], ['O'], ['N']]
To add titles, we typically use a dictionary. For example
#create a dictionary
letters_dict = {}
#iterate over original list as above except now saving to a dictionary
for i in range(len(letters)):
letters_dict['letter'+str(i+1)] = [letters[i]]
This gives you the following:
In [4]: letters_dict
Out[4]:
{'letter1': ['A'],
'letter2': ['A'],
'letter3': ['R'],
'letter4': ['O'],
'letter5': ['N']}
You can now access each of the lists as follows:
In [5]: letters_dict['letters1']
Out[5]: ['A']
Finally, just for completeness, there's a cool extension of the dictionary method. Namely, using code from this thread, you can do the following:
#create a class
class atdict(dict):
__getattr__= dict.__getitem__
__setattr__= dict.__setitem__
__delattr__= dict.__delitem__
#create an instance of the class using our dictionary:
l = atdict(letters_dict)
This way, you can do the following:
In [11]: l.letter1
Out[11]: ['A']
In [12]: l.letter5
Out[12]: ['N']
If you have no desire to store the values in an iterable or referencable object (ie dictionary, list, class) as you suggest in your question, then you could literally do the below:
letter1 = letters[0]
letter2 = letters[1]
letter3 = letters[2]
#and so forth ...
but as you can see, even with 6 variables the above becomes tedious.

Same code different results from Py2.7 to Py3.4. Where is the mistake?

I am refactoring a few lines of code found in Harrington, P. (2012). Machine Learning in Action, Chapters 11 and 12. The code is supposed to build an FP-tree from a test dataset and it goes as it follows.
from __future__ import division, print_function
class treeNode:
'''
Basic data structure for an FP-tree (Frequent-Pattern).
'''
def __init__(self, nameValue, numOccur, parentNode):
self.name = nameValue
self.count = numOccur
self.nodeLink = None
self.parent = parentNode
self.children = {}
def inc(self, numOccur):
'''
Increments the count variable by a given amount.
'''
self.count += numOccur
def disp(self, ind=1):
'''
Displays the tree in text.
'''
print('{}{}:{}'.format('-'*(ind-1),self.name,self.count))
for child in list(self.children.values()):
child.disp(ind+1)
def createTree(dataSet, minSup=1):
'''
Takes the dataset and the minimum support
and builds the FP-tree.
'''
headerTable = {} #stores the counts
#loop over the dataset and count the frequency of each term.
for trans in dataSet:
for item in trans:
headerTable[item] = headerTable.get(item, 0) + dataSet[trans]
#scan the header table and delete items occurring less than minSup
for k in list(headerTable.keys()):
if headerTable[k] < minSup:
del(headerTable[k])
freqItemSet = set(headerTable.keys())
#if no item is frequent, quit
if len(freqItemSet) == 0:
return None, None
#expand the header table
#so it can hold a count and pointer to the first item of each type.
for k in list(headerTable.keys()):
headerTable[k] = [headerTable[k], None]
#create the base node, which contains the 'Null Set'
retTree = treeNode('Null Set', 1, None)
#iterate over the dataset again
#this time using only items that are frequent
for tranSet, count in list(dataSet.items()):
localD = {}
for item in tranSet:
if item in freqItemSet:
localD[item] = headerTable[item][0]
if len(localD) > 0:
#sort the items and the call updateTree()
orderedItems = [v[0] for v in sorted(list(localD.items()),
key=lambda p: p[1], reverse=True)]
updateTree(orderedItems, retTree, headerTable, count)
return retTree, headerTable
def updateTree(items, inTree, headerTable, count):
if items[0] in inTree.children:
inTree.children[items[0]].inc(count)
else:
#Populate tree with ordered freq itemset
inTree.children[items[0]] = treeNode(items[0], count, inTree)
if headerTable[items[0]][1] == None:
headerTable[items[0]][1] = inTree.children[items[0]]
else:
updateHeader(headerTable[items[0]][1],inTree.children[items[0]])
#Recursively call updateTree on the remaining items
if len(items) > 1:
updateTree(items[1::], inTree.children[items[0]], headerTable, count)
def updateHeader(nodeToTest, targetNode):
while (nodeToTest.nodeLink != None):
nodeToTest = nodeToTest.nodeLink
nodeToTest.nodeLink = targetNode
def loadSimpDat():
simpDat = [['r', 'z', 'h', 'j', 'p'],
['z', 'y', 'x', 'w', 'v', 'u', 't', 's'],
['z'],
['r', 'x', 'n', 'o', 's'],
['y', 'r', 'x', 'z', 'q', 't', 'p'],
['y', 'z', 'x', 'e', 'q', 's', 't', 'm']]
return simpDat
def createInitSet(dataSet):
retDict = {}
for trans in dataSet:
retDict[frozenset(trans)] = 1
return retDict
simpDat = loadSimpDat()
initSet = createInitSet(simpDat)
myFPtree, myHeaderTab = createTree(initSet, 3)
myFPtree.disp()
This code run without errors in both Python 2.7.9 and 3.4.3. However the output I get is different. Moreover, the output I get with using Py2.7 is consistent while running the same code over and over again with Py3.4 leads to different results.
The correct result is the one obtained using Py2.7 but I cannot figure out why it doesn't work on 3.4.
Why?
What is wrong with this code when interpreted with Python3?
The output describe a defined tree. The order of the branches can change, but the underlined tree shall be the same. This is always the case with Python2 where the output looks like this:
-x:1
--s:1
---r:1
-z:5
--x:3
---y:3
----s:2
-----t:2
----r:1
-----t:1
--r:1
It should represent this tree.
Null
/ \
x z
/ \ / \
s r x r
| |
r y
/ \
s r
| |
t t
This is an example of the wrong result I get using Python3.
-z:5
--r:1
--x:3
---t:3
----y:2
-----s:2
----r:1
-----y:1
-x:1
--r:1
---s:1
P.S. I have tried to use OrderedDict instead of {}, it doesn't change anything...
What seem to happen is that you rely on the ordering of dict iteration (ie the order of d.keys(), d.items() etc). While both python2 and python3 guarantee that the iteration order is consistent during the execution it doesn't guarantee that it is consistent from run to run.
Therefore it's a correct behaviour that the order of the output differs from run to run. That you get the same result from run to run in python2 is pure "luck".
You can get python3 to behave deterministic by setting the PYTHONHASHSEED environment variable to a fixed value, but probably you shouldn't rely on dict iteration to be deterministic.