custom assert in pytest should overrule standard assert - assert

I wrote a custom assert function to compare items in two lists such that the order is not important, using pytest_assertrepr_compare. This works fine and reports a failure when the content of the lists differ.
However, if the custom assert passes, it fails on the default '==' assert because item 0 of one list is unequal to item 0 of the other list.
Is there a way to prevent the default assert to kick in?
assert ['a', 'b', 'c'] == ['b', 'a', 'c'] # custom assert passes
# default assert fails
The custom assert function is:
def pytest_assertrepr_compare(config, op, left, right):
equal = True
if op == '==' and isinstance(left, list) and isinstance(right, list):
if len(left) != len(right):
equal = False
else:
for l in left:
if not l in right:
equal = False
if equal:
for r in right:
if not r in left:
equal = False
if not equal:
return ['Comparing lists:',
' vals: %s != %s' % (left, right)]

I found easiest way to combinate py.test & pyhamcrest. In your example it is easy to use contains_inanyorder matcher:
from hamcrest import assert_that, contains_inanyorder
def test_first():
assert_that(['a', 'b', 'c'], contains_inanyorder('b', 'a', 'c'))

You could use a python set
assert set(['a', 'b', 'c']) == set(['b', 'a', 'c'])
This will return true

Related

Is there a way to group_by with_index in Crystal?

So I have this (nicely sorted) array.
And sometimes I need all of the elements from the array. But other times I need all of the even-indexed members together and all of the odd-indexed members together. And then again, sometimes I need it split into three groups with indices 0,3,6 etc. in one group, then 1,4,7 in the next and finally 2,5,8 in the last.
This can be done with group_by and taking the modulus of the index. See for yourself:
https://play.crystal-lang.org/#/r/4kzj
arr = ['a', 'b', 'c', 'd', 'e']
puts arr.group_by { |x| arr.index(x).not_nil! % 1 } # {0 => ['a', 'b', 'c', 'd', 'e']}
puts arr.group_by { |x| arr.index(x).not_nil! % 2 } # {0 => ['a', 'c', 'e'], 1 => ['b', 'd']}
puts arr.group_by { |x| arr.index(x).not_nil! % 3 } # {0 => ['a', 'd'], 1 => ['b', 'e'], 2 => ['c']}
But that not_nil! in there feels like a code-smell / warning that there's a better way.
Can I get the index of the elements without needing to look it up and handle the Nil type?
You can also just do:
arr = ['a', 'b', 'c', 'd', 'e']
i = 0
puts arr.group_by { |x| i += 1; i % 1 }
i = 0
puts arr.group_by { |x| i += 1; i % 2 }
i = 0
puts arr.group_by { |x| i += 1; i % 3 }
Besides the nilable return type, it's also very inefficient to call Array#index for each element. This means a runtime of O(N²).
#group_by is used for grouping by value, but you don't need the value for grouping as you just want to group by index. That can be done a lot easier than wrapping around #group_by and #index
A more efficient solution is to loop over the indices and group the values based on the index:
groups = [[] of Char, [] of Char]
arr.each_index do |i|
groups[i % 2] << arr[i]
end
There is no special method for this, but it's fairly simple to implement yourself.
If you don't need all groups, but only one of them, you can also use Int32#step to iterate every other index:
group = [] of Char
2.step(to: arr.size - 1, by: 3) do |i|
group << arr[i]
end

Dictionary w nested dicts to list in specified order

Sorry for the post if it seems redundant. I've looked through a bunch of other posts and I can't seem to find what i'm looking for - perhaps bc I'm a python newby trying to write basic code...
Given a dictionary of any size: Some keys have a single value, others have a nested dictionary as its value.
I would like to convert the dictionary into a list (including the nested values as list items) but in a specific order.
for example:
d = {'E':{'e3': 'Zzz', 'e1':'Xxx', 'e2':'Yyy'}, 'D': {'d3': 'Vvv', 'd1':'Nnn', 'd2':'Kkk'}, 'U': 'Bbb'}
and I would like it to look like this:
order_list = ['U', 'D', 'E'] # given this order...
final_L = ['U', 'Bbb', 'D', 'd1', 'Nnn', 'd2', 'Kkk', 'd3', 'Vvv', 'E', 'e1', 'Xxx', 'e2', 'Yyy', 'e3', 'Zzz']
I can make the main keys fall into order but the the nested values. Here's what i have so far...
d = {'E':{'e3': 'Zzz', 'e1':'Xxx', 'e2':'Yyy'}, 'D': {'d3': 'Vvv', 'd1':'Nnn', 'd2':'Kkk'}, 'U': 'Bbb'}
order_list = ['U', 'D', 'E']
temp_list = []
for x in order_list:
for key,value in d.items():
if key == x:
temp_list.append([key,value])
final_L = [item for sublist in temp_list for item in sublist]
print(final_L)
My current output is:
['U', 'Bbb', 'D', {'d1': 'Nnn', 'd2': 'Kkk', 'd3': 'Vvv'}, 'E', {'e1': 'Xxx', 'e3': 'Zzz', 'e2': 'Yyy'}]
So there a couple of easy transformation to make with a list comprehension:
>>> [(k, sorted(d[k].items()) if isinstance(d[k], dict) else d[k]) for k in 'UDE']
[('U', 'Bbb'),
('D', [('d1', 'Nnn'), ('d2', 'Kkk'), ('d3', 'Vvv')]),
('E', [('e1', 'Xxx'), ('e2', 'Yyy'), ('e3', 'Zzz')])]
Now you just need to flatten an arbitrary depth list, here's a post describing how to do that:
import collections
def flatten(l):
for el in l:
if isinstance(el, collections.Iterable) and not isinstance(el, str):
yield from flatten(e)
else:
yield el
>>> list(flatten((k, sorted(d[k].items()) if isinstance(d[k], dict) else d[k]) for k in 'UDE'))
['U', 'Bbb', 'D', 'd1', 'Nnn', 'd2', 'Kkk', 'd3', 'Vvv', 'E', 'e1', 'Xxx', 'e2', 'Yyy', 'e3', 'Zzz']

comparing two lists of unequal length at each index

I have two lists of unequal length such as
list1 = ['G','T','C','A','G']
list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']
I want to compare these two lists at each index only against the corresponding positions i.e list2[0] against list1[0] and list2[1] against list1[1] and so on upto the length of list1.
And get two new lists one having the mismatches and the second having the position of mismatches for example in the language of coding it can be stated as :
if 'G' == 'GGG' or 'G' # where 'G' is from list1[1] and 'GGG' is from list2[2]
elif 'G' == 'AAA'
{
outlist1 == list1[index] # postion of mismatch
outlist2 == 'G/A'
}
ok this works. There are definitely ways to do it in less code, but I think this is pretty clear:
#Function to process the lists
def get_mismatches(list1,list2):
#Prepare the output lists
mismatch_list = []
mismatch_pos = []
#Figure out which list is smaller
smaller_list_len = min(len(list1),len(list2))
#Loop through the lists checking element by element
for ind in range(smaller_list_len):
elem1 = list1[ind][0] #First char of string 1, such as 'G'
elem2 = list2[ind][0] #First char of string 2, such as 'A'
#If they match just continue
if elem1 == elem2:
continue
#If they don't match update the output lists
else:
mismatch_pos.append(ind)
mismatch_list.append(elem1+'/'+elem2)
#Return the output lists
return mismatch_list,mismatch_pos
#Make input lists
list1 = ['G','T','C','A','G']
list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']
#Call the function to get the output lists
outlist1,outlist2 = get_mismatches(list1,list2)
#Print the output lists:
print outlist1
print outlist2
Output:
['G/A', 'C/G', 'A/C']
[0, 2, 3]
And just to see how short I could get the code I made this function which I think is equivalent:
def short_get_mismatches(l1,l2):
o1,o2 = zip(*[(i,x[0]+'/'+y[0]) for i,(x,y) in enumerate(zip(l1,l2)) if x[0] != y[0]])
return list(o1),list(o2)
#Make input lists
list1 = ['G','T','C','A','G']
list2 = ['AAAAA','TTTT','GGGG','CCCCCCCC']
#Call the function to get the output lists
outlist1,outlist2 = short_get_mismatches(list1,list2)
EDIT:
I'm not sure if I'm cleaning the sequence as you want w/ the N's and -'s. Is this the answer to the example in your comment?
Unclean list1 ['A', 'T', 'G', 'C', 'A', 'C', 'G', 'T', 'C', 'G']
Clean list1 ['A', 'T', 'G', 'C', 'A', 'C', 'G', 'T', 'C', 'G']
Unclean list2 ['GGG', 'TTTN', '-', 'NNN', 'AAA', 'CCC', 'GCCC', 'TTT', 'CCCTN']
Clean list2 ['GGG', 'TTT', 'AAA', 'CCC', 'GCCC', 'TTT', 'CCCT']
0 A GGG
1 T TTT
2 G AAA
3 C CCC
4 A GCCC
5 C TTT
6 G CCCT
['A/G', 'G/A', 'A/G', 'C/T', 'G/C']
[0, 2, 4, 5, 6]
this works fine for my question:
#!/usr/bin/env python
list1=['A', 'T', 'G', 'C', 'A' ,'C', 'G' , 'T' , 'C', 'G']
list2=[ 'GGG' , 'TTTN' , ' - ' , 'NNN' , 'AAA' , 'CCC' , 'GCCC' , 'TTT' ,'CCCATN' ]
notifications = []
indexes = []
for i in range(min(len(list1), len(list2))):
item1 = list1[i]
item2 = list2[i]
# Skip ' - '
if item2 == ' - ':
continue
# Remove N since it's a wildcard
item2 = item2.replace('N', '')
# Remove item1
item2 = item2.replace(item1, '')
chars = set(item2)
# All matched
if len(chars) == 0:
continue
notifications.append('{}/{}'.format(item1, '/'.join(set(item2))))
indexes.append(i)
print(notifications)
print(indexes)
It gives the output as
['A/G', 'G/C', 'C/A/T']
[0, 6, 8]

Same code different results from Py2.7 to Py3.4. Where is the mistake?

I am refactoring a few lines of code found in Harrington, P. (2012). Machine Learning in Action, Chapters 11 and 12. The code is supposed to build an FP-tree from a test dataset and it goes as it follows.
from __future__ import division, print_function
class treeNode:
'''
Basic data structure for an FP-tree (Frequent-Pattern).
'''
def __init__(self, nameValue, numOccur, parentNode):
self.name = nameValue
self.count = numOccur
self.nodeLink = None
self.parent = parentNode
self.children = {}
def inc(self, numOccur):
'''
Increments the count variable by a given amount.
'''
self.count += numOccur
def disp(self, ind=1):
'''
Displays the tree in text.
'''
print('{}{}:{}'.format('-'*(ind-1),self.name,self.count))
for child in list(self.children.values()):
child.disp(ind+1)
def createTree(dataSet, minSup=1):
'''
Takes the dataset and the minimum support
and builds the FP-tree.
'''
headerTable = {} #stores the counts
#loop over the dataset and count the frequency of each term.
for trans in dataSet:
for item in trans:
headerTable[item] = headerTable.get(item, 0) + dataSet[trans]
#scan the header table and delete items occurring less than minSup
for k in list(headerTable.keys()):
if headerTable[k] < minSup:
del(headerTable[k])
freqItemSet = set(headerTable.keys())
#if no item is frequent, quit
if len(freqItemSet) == 0:
return None, None
#expand the header table
#so it can hold a count and pointer to the first item of each type.
for k in list(headerTable.keys()):
headerTable[k] = [headerTable[k], None]
#create the base node, which contains the 'Null Set'
retTree = treeNode('Null Set', 1, None)
#iterate over the dataset again
#this time using only items that are frequent
for tranSet, count in list(dataSet.items()):
localD = {}
for item in tranSet:
if item in freqItemSet:
localD[item] = headerTable[item][0]
if len(localD) > 0:
#sort the items and the call updateTree()
orderedItems = [v[0] for v in sorted(list(localD.items()),
key=lambda p: p[1], reverse=True)]
updateTree(orderedItems, retTree, headerTable, count)
return retTree, headerTable
def updateTree(items, inTree, headerTable, count):
if items[0] in inTree.children:
inTree.children[items[0]].inc(count)
else:
#Populate tree with ordered freq itemset
inTree.children[items[0]] = treeNode(items[0], count, inTree)
if headerTable[items[0]][1] == None:
headerTable[items[0]][1] = inTree.children[items[0]]
else:
updateHeader(headerTable[items[0]][1],inTree.children[items[0]])
#Recursively call updateTree on the remaining items
if len(items) > 1:
updateTree(items[1::], inTree.children[items[0]], headerTable, count)
def updateHeader(nodeToTest, targetNode):
while (nodeToTest.nodeLink != None):
nodeToTest = nodeToTest.nodeLink
nodeToTest.nodeLink = targetNode
def loadSimpDat():
simpDat = [['r', 'z', 'h', 'j', 'p'],
['z', 'y', 'x', 'w', 'v', 'u', 't', 's'],
['z'],
['r', 'x', 'n', 'o', 's'],
['y', 'r', 'x', 'z', 'q', 't', 'p'],
['y', 'z', 'x', 'e', 'q', 's', 't', 'm']]
return simpDat
def createInitSet(dataSet):
retDict = {}
for trans in dataSet:
retDict[frozenset(trans)] = 1
return retDict
simpDat = loadSimpDat()
initSet = createInitSet(simpDat)
myFPtree, myHeaderTab = createTree(initSet, 3)
myFPtree.disp()
This code run without errors in both Python 2.7.9 and 3.4.3. However the output I get is different. Moreover, the output I get with using Py2.7 is consistent while running the same code over and over again with Py3.4 leads to different results.
The correct result is the one obtained using Py2.7 but I cannot figure out why it doesn't work on 3.4.
Why?
What is wrong with this code when interpreted with Python3?
The output describe a defined tree. The order of the branches can change, but the underlined tree shall be the same. This is always the case with Python2 where the output looks like this:
-x:1
--s:1
---r:1
-z:5
--x:3
---y:3
----s:2
-----t:2
----r:1
-----t:1
--r:1
It should represent this tree.
Null
/ \
x z
/ \ / \
s r x r
| |
r y
/ \
s r
| |
t t
This is an example of the wrong result I get using Python3.
-z:5
--r:1
--x:3
---t:3
----y:2
-----s:2
----r:1
-----y:1
-x:1
--r:1
---s:1
P.S. I have tried to use OrderedDict instead of {}, it doesn't change anything...
What seem to happen is that you rely on the ordering of dict iteration (ie the order of d.keys(), d.items() etc). While both python2 and python3 guarantee that the iteration order is consistent during the execution it doesn't guarantee that it is consistent from run to run.
Therefore it's a correct behaviour that the order of the output differs from run to run. That you get the same result from run to run in python2 is pure "luck".
You can get python3 to behave deterministic by setting the PYTHONHASHSEED environment variable to a fixed value, but probably you shouldn't rely on dict iteration to be deterministic.

Pythonic way of checking if several elements are in a list

I have this piece of code in Python:
if 'a' in my_list and 'b' in my_list and 'c' in my_list:
# do something
print my_list
Is there a more pythonic way of doing this?
Something like (invalid python code follows):
if ('a', 'b', 'c') individual_in my_list:
# do something
print my_list
if set("abc").issubset(my_list):
# whatever
The simplest form:
if all(x in mylist for x in 'abc'):
pass
Often when you have a lot of items in those lists it is better to use a data structure that can look up items without having to compare each of them, like a set.
You can use set operators:
if set('abc') <= set(my_list):
print('matches')
superset = ('a', 'b', 'c', 'd')
subset = ('a', 'b')
desired = set(('a', 'b', 'c'))
assert desired <= set(superset) # True
assert desired.issubset(superset) # True
assert desired <= set(subset) # False