doctests for randomly generated values - unit-testing

Given the following code:
defmodule Pullapi.Workout do
import Pullapi.Numbers
#moduledoc """
Functions that generate a workout representation
"""
#doc """
Returns a pullup set defined by the number of `max_reps` a user can do, a `percentage`, and the
number of maximum additional or decremented reps, `rep_bound`.
## Examples
iex> Pullapi.Workout.pullup_set(20, 60, 5)
%{"Action" => "Pullups", "Units" => "14"}
"""
#spec pullup_set(integer, integer, integer) :: map()
def pullup_set(max_reps, percentage, rep_bound) do
median = max_reps * (percentage / 100)
unit_range = Pullapi.Numbers.median_range(round(median), rep_bound)
units = Enum.random(unit_range)
%{"Action" => "Pullups", "Units" => "#{units}"}
end
end
The doctest fails with:
1) test doc at Pullapi.Workout.pullup_set/3 (1) (PullapiTest)
test/pullapi_test.exs:4
Doctest failed
code: Pullapi.Workout.pullup_set(20, 60, 5) === %{"Action" => "Pullups", "Units" => "14"}
left: %{"Action" => "Pullups", "Units" => "8"}
stacktrace:
lib/pullapi/workout.ex:13: Pullapi.Workout (module)
Is there a way of specifying that the "Units" value is randomly generated? It looks like I'm following the way Enum.random is doctested

Enum.random's doctest is explicitly setting a seed value for the test, which makes the result of future calls to :rand functions deterministic.
iex(1)> for _ <- 1..10 do
...(1)> :rand.seed(:exsplus, {101, 102, 103})
...(1)> Enum.random([1, 2, 3])
...(1)> end
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
The person who wrote the tests most likely ran the functions once to check what values are returned after those seed values are set and then put them in the doctest. Unless the inner workings of :rand change, those seeds will keep producing the same values, which is good enough for doctests (you can always fix the tests if it breaks in future versions of Erlang).
So, to fix your doctest, you should execute this code once in iex (you can change the seed values if you want):
:rand.seed(:exsplus, {101, 102, 103})
Pullapi.Workout.pullup_set(20, 60, 5)
And then hardcode the returned values in your doctest. Your tests should now pass until Erlang's rand module's internals change.

Related

Return the last k numbers of a list (Python) [duplicate]

I need the last 9 numbers of a list and I'm sure there is a way to do it with slicing, but I can't seem to get it. I can get the first 9 like this:
num_list[0:9]
You can use negative integers with the slicing operator for that. Here's an example using the python CLI interpreter:
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
>>> a[-9:]
[4, 5, 6, 7, 8, 9, 10, 11, 12]
the important line is a[-9:]
a negative index will count from the end of the list, so:
num_list[-9:]
Slicing
Python slicing is an incredibly fast operation, and it's a handy way to quickly access parts of your data.
Slice notation to get the last nine elements from a list (or any other sequence that supports it, like a string) would look like this:
num_list[-9:]
When I see this, I read the part in the brackets as "9th from the end, to the end." (Actually, I abbreviate it mentally as "-9, on")
Explanation:
The full notation is
sequence[start:stop:step]
But the colon is what tells Python you're giving it a slice and not a regular index. That's why the idiomatic way of copying lists in Python 2 is
list_copy = sequence[:]
And clearing them is with:
del my_list[:]
(Lists get list.copy and list.clear in Python 3.)
Give your slices a descriptive name!
You may find it useful to separate forming the slice from passing it to the list.__getitem__ method (that's what the square brackets do). Even if you're not new to it, it keeps your code more readable so that others that may have to read your code can more readily understand what you're doing.
However, you can't just assign some integers separated by colons to a variable. You need to use the slice object:
last_nine_slice = slice(-9, None)
The second argument, None, is required, so that the first argument is interpreted as the start argument otherwise it would be the stop argument.
You can then pass the slice object to your sequence:
>>> list(range(100))[last_nine_slice]
[91, 92, 93, 94, 95, 96, 97, 98, 99]
islice
islice from the itertools module is another possibly performant way to get this. islice doesn't take negative arguments, so ideally your iterable has a __reversed__ special method - which list does have - so you must first pass your list (or iterable with __reversed__) to reversed.
>>> from itertools import islice
>>> islice(reversed(range(100)), 0, 9)
<itertools.islice object at 0xffeb87fc>
islice allows for lazy evaluation of the data pipeline, so to materialize the data, pass it to a constructor (like list):
>>> list(islice(reversed(range(100)), 0, 9))
[99, 98, 97, 96, 95, 94, 93, 92, 91]
The last 9 elements can be read from left to right using numlist[-9:], or from right to left using numlist[:-10:-1], as you want.
>>> a=range(17)
>>> print a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> print a[-9:]
[8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> print a[:-10:-1]
[16, 15, 14, 13, 12, 11, 10, 9, 8]
Here are several options for getting the "tail" items of an iterable:
Given
n = 9
iterable = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Desired Output
[2, 3, 4, 5, 6, 7, 8, 9, 10]
Code
We get the latter output using any of the following options:
from collections import deque
import itertools
import more_itertools
# A: Slicing
iterable[-n:]
# B: Implement an itertools recipe
def tail(n, iterable):
"""Return an iterator over the last *n* items of *iterable*.
>>> t = tail(3, 'ABCDEFG')
>>> list(t)
['E', 'F', 'G']
"""
return iter(deque(iterable, maxlen=n))
list(tail(n, iterable))
# C: Use an implemented recipe, via more_itertools
list(more_itertools.tail(n, iterable))
# D: islice, via itertools
list(itertools.islice(iterable, len(iterable)-n, None))
# E: Negative islice, via more_itertools
list(more_itertools.islice_extended(iterable, -n, None))
Details
A. Traditional Python slicing is inherent to the language. This option works with sequences such as strings, lists and tuples. However, this kind of slicing does not work on iterators, e.g. iter(iterable).
B. An itertools recipe. It is generalized to work on any iterable and resolves the iterator issue in the last solution. This recipe must be implemented manually as it is not officially included in the itertools module.
C. Many recipes, including the latter tool (B), have been conveniently implemented in third party packages. Installing and importing these these libraries obviates manual implementation. One of these libraries is called more_itertools (install via > pip install more-itertools); see more_itertools.tail.
D. A member of the itertools library. Note, itertools.islice does not support negative slicing.
E. Another tool is implemented in more_itertools that generalizes itertools.islice to support negative slicing; see more_itertools.islice_extended.
Which one do I use?
It depends. In most cases, slicing (option A, as mentioned in other answers) is most simple option as it built into the language and supports most iterable types. For more general iterators, use any of the remaining options. Note, options C and E require installing a third-party library, which some users may find useful.

Can somebody explain what does this 'nx.connected_components()' does?

I have got some code from git and i was trying to understand it, here's a part of it, i didn't understand the second line of this code
G = nx.Graph(network_map) # Graph for the whole network
components = list(nx.connected_components(G))
Whats does this function connected_components do? I went through the documentation and couldn't understand it properly.
nx.connected_components(G) will return "A generator of sets of nodes, one for each component of G". A generator in Python allows iterating over values in a lazy manner (i.e., will generate the next item only when necessary).
The documentation provides the following example:
>>> import networkx as nx
>>> G = nx.path_graph(4)
>>> nx.add_path(G, [10, 11, 12])
>>> [len(c) for c in sorted(nx.connected_components(G), key=len, reverse=True)]
[4, 3]
Let's go through it:
G = nx.path_graph(4) - create the directed graph 0 -> 1 -> 2 -> 3
nx.add_path(G, [10, 11, 12]) - add to G: 10 -> 11 -> 12
So, now G is a graph with 2 connected components.
[len(c) for c in sorted(nx.connected_components(G), key=len, reverse=True)] - list the sizes of all connected components in G from the largest to smallest. The result is [4, 3] since {0, 1, 2, 3} is of size 4 and {10, 11, 12} is of size 3.
So just to recap - the result is a generator (lazy iterator) over all connected components in G, where each connected component is simply a set of nodes.

How do you Unit Test Python DataFrames

How do i unit test python dataframes?
I have functions that have an input and output as dataframes. Almost every function I have does this. Now if i want to unit test this what is the best method of doing it? It seems a bit of an effort to create a new dataframe (with values populated) for every function?
Are there any materials you can refer me to? Should you write unit tests for these functions?
While Pandas' test functions are primarily used for internal testing, NumPy includes a very useful set of testing functions that are documented here: NumPy Test Support.
These functions compare NumPy arrays, but you can get the array that underlies a Pandas DataFrame using the values property. You can define a simple DataFrame and compare what your function returns to what you expect.
One technique you can use is to define one set of test data for a number of functions. That way, you can use Pytest Fixtures to define that DataFrame once, and use it in multiple tests.
In terms of resources, I found this article on Testing with NumPy and Pandas to be very useful. I also did a short presentation about data analysis testing at PyCon Canada 2016: Automate Your Data Analysis Testing.
you can use pandas testing functions:
It will give more flexbile to compare your result with computed result in different ways.
For example:
df1=pd.DataFrame({'a':[1,2,3,4,5]})
df2=pd.DataFrame({'a':[6,7,8,9,10]})
expected_res=pd.Series([7,9,11,13,15])
pd.testing.assert_series_equal((df1['a']+df2['a']),expected_res,check_names=False)
For more details refer this link
If you are using pytest, pandasSnapshot will be useful.
# use with pytest
import pandas as pd
from snapshottest_ext.dataframe import PandasSnapshot
def test_format(snapshot):
df = pd.DataFrame([['a', 'b'], ['c', 'd']],
columns=['col 1', 'col 2'])
snapshot.assert_match(PandasSnapshot(df))
One big cons is that the snapshot is not readable anymore. (store the content as csv is more readable, but it is problematic.
PS: I am the author of pytest snapshot extension.
I don't think it's hard to create small DataFrames for unit testing?
import pandas as pd
from nose.tools import assert_dict_equal
input_df = pd.DataFrame.from_dict({
'field_1': [some, values],
'field_2': [other, values]
})
expected = {
'result': [...]
}
assert_dict_equal(expected, my_func(input_df).to_dict(), "oops, there's a bug...")
You could use snapshottest and do something like this:
def test_something_works(snapshot): # snapshot is a pytest fixture from snapshottest
data_frame = calc_something_and_return_pandas_dataframe()
snapshot.assert_match(data_frame.to_csv(index=False), 'some_module_level_unique_name_for_the_snapshot')
This will create a snapshots folder with a file in that contains the csv output that you can update with --snapshot-update when your code changes.
It works by comparing the data_frame variable to what is saved to disk.
Might be worth mentioning that your snapshots should be checked in to source control.
I would suggest writing the values as CSV in docstrings (or separate files if they're large) and parsing them using pd.read_csv(). You can parse the expected output from CSV too, and compare, or else use df.to_csv() to write a CSV out and diff it.
Pandas has built in testing functions, but I don't find the output easy to parse, so I created an open source project called beavis with functions that output error messages that are easier for humans to read.
Here's an example of one of the built in testing methods:
df = pd.DataFrame({"col1": [1042, 2, 9, 6], "col2": [5, 2, 7, 6]})
pd.testing.assert_series_equal(df["col1"], df["col2"])
Here's the error message:
> ???
E AssertionError: Series are different
E
E Series values are different (50.0 %)
E [index]: [0, 1, 2, 3]
E [left]: [1042, 2, 9, 6]
E [right]: [5, 2, 7, 6]
Not very easy to see which rows are mismatched because the output isn't aligned.
Here's how you can write the same test with beavis.
import beavis
beavis.assert_pd_column_equality(df, "col1", "col2")
This'll give you the following readable error message:
The built-in assert_frame_equal doesn't give a readable error message either. Here's how you can compare DataFrame equality with beavis.
df1 = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df2 = pd.DataFrame({'col1': [5, 2], 'col2': [3, 4]})
beavis.assert_pd_equality(df1, df2)
The frame-fixtures Python package (of which I am an author) is designed to make it easy to "create a new dataframe (with values populated)" for unit or performance tests.
For example, if you want to test against a DataFrame of floats and strings with a numerical index, you can use a compact string declaration to generate a DataFrame.
>>> ff.Fixture.to_frame('i(I,int)|v(float,str)|s(4,2)').to_pandas()
0 1
34715 1930.40 zaji
-3648 -1760.34 zJnC
91301 1857.34 zDdR
30205 1699.34 zuVU
>>> ff.Fixture.to_frame('i(I,int)|v(float,str)|s(8,3)').to_pandas()
0 1 2
34715 1930.40 zaji 694.30
-3648 -1760.34 zJnC -72.96
91301 1857.34 zDdR 1826.02
30205 1699.34 zuVU 604.10
54020 268.96 zKka 1080.40
129017 3511.58 zJXD 2580.34
35021 1175.36 zPAQ 700.42
166924 2925.68 zyps 3338.48

Counterbalancing in open sesame

I a writing an inline_script in open sesame (python).
Can anyone tell me what's wrong here? (i think its something very simple, but i can not find it)
when i put the number in List = [1,2,3,4,5,6,7] the first line works, but the second does not work :(
BalanceList1 = range(1:7) + range(13:19) #does not work
if self.get('subject_nr') == "BalanceList1":
#here follows a list of commands
BalanceList2 = list(range(7:13))+list(range(19:25)) #does not work either
elif self.get('subject_nr') == "BalanceList2":
#other commands
In python 2.x you can do the following:
BalanceList1 = range(1,6) + range(13,19)
which will generate 2 lists and add them together in BalanceList1:
[1, 2, 3, 4, 5, 13, 14, 15, 16, 17, 18]
In python 3.x, range doesn't return a list anymore but an iterator (and xrange is gone), you have to explicitly convert to list:
BalanceList1 = list(range(1,6))+list(range(13,19))
A more optimal way to avoid creating too many temporary lists would be:
BalanceList1 = list(range(1,6))
BalanceList1.extend(range(13,19)) # avoids creating the list for 13->18
more optimal than:

Django model group as a list

I have a model with test data as below
id days
1, 30
1, 40
2, 10
2, 20
1, 90
I want output as
1, [30,40,90]
2, [10,20]
How can I get this in Django?
It's not much Django, it's pure python. To get the result as a mapping on 'id' as key:
result = {}
for obj in Mymodel.objects.all():
if result.has_key(obj.id):
result[obj.id].append(obj.days)
else:
result[obj.id] = [obj.days]
print result
>>> {1: [30, 40, 90], 2: [10, 20]}
The order of the elements in each list is not defined. If you require these to be ordered, best would be to append .order_by('days') on the Queryset.
A final remark: Your 'id' is not unique. I would consider a non-pk-column named 'id' a bad practice, since 'id' is Django's default name for the automatically created pk-field.