How to use combinations of sets as test data - unit-testing

I would like to test a function with a tuple from a set of fringe cases and normal values. For example, while testing a function which returns true whenever given three lengths that form a valid triangle, I would have specific cases, negative / small / large numbers, values close-to being overflowed, etc.; what is more, main aim is to generate combinations of these values, with or without repetition, in order to get a set of test data.
(inf,0,-1), (5,10,1000), (10,5,5), (0,-1,5), (1000,inf,inf),
...
As a note: I actually know the answer to this, but it might be helpful for others, and a challenge for people here! --will post my answer later on.

Absolutely, especially dealing with lots of these permutations/combinations I can definitely see that the first pass would be an issue.
Interesting implementation in python, though I wrote a nice one in C and Ocaml based on "Algorithm 515" (see below). He wrote his in Fortran as it was common back then for all the "Algorithm XX" papers, well, that assembly or c. I had to re-write it and make some small improvements to work with arrays not ranges of numbers. This one does random access, I'm still working on getting some nice implementations of the ones mentioned in Knuth 4th volume fascicle 2. I'll an explanation of how this works to the reader. Though if someone is curious, I wouldn't object to writing something up.
/** [combination c n p x]
* get the [x]th lexicographically ordered set of [p] elements in [n]
* output is in [c], and should be sizeof(int)*[p] */
void combination(int* c,int n,int p, int x){
int i,r,k = 0;
for(i=0;i<p-1;i++){
c[i] = (i != 0) ? c[i-1] : 0;
do {
c[i]++;
r = choose(n-c[i],p-(i+1));
k = k + r;
} while(k < x);
k = k - r;
}
c[p-1] = c[p-2] + x - k;
}
~"Algorithm 515: Generation of a Vector from the Lexicographical Index"; Buckles, B. P., and Lybanon, M. ACM Transactions on Mathematical Software, Vol. 3, No. 2, June 1977.

With the brand new Python 2.6, you have a standard solution with the itertools module that returns the Cartesian product of iterables :
import itertools
print list(itertools.product([1,2,3], [4,5,6]))
[(1, 4), (1, 5), (1, 6),
(2, 4), (2, 5), (2, 6),
(3, 4), (3, 5), (3, 6)]
You can provide a "repeat" argument to perform the product with an iterable and itself:
print list(itertools.product([1,2], repeat=3))
[(1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 2),
(2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2)]
You can also tweak something with combinations as well :
print list(itertools.combinations('123', 2))
[('1', '2'), ('1', '3'), ('2', '3')]
And if order matters, there are permutations :
print list(itertools.permutations([1,2,3,4], 2))
[(1, 2), (1, 3), (1, 4),
(2, 1), (2, 3), (2, 4),
(3, 1), (3, 2), (3, 4),
(4, 1), (4, 2), (4, 3)]
Of course all that cool stuff don't exactly do the same thing, but you can use them in a way or another to solve you problem.
Just remember that you can convert a tuple or a list to a set and vice versa using list(), tuple() and set().

Interesting question!
I would do this by picking combinations, something like the following in python. The hardest part is probably first pass verification, i.e. if f(1,2,3) returns true, is that a correct result? Once you have verified that, then this is a good basis for regression testing.
Probably it's a good idea to make a set of test cases that you know will be all true (e.g. 3,4,5 for this triangle case), and a set of test cases that you know will be all false (e.g. 0,1,inf). Then you can more easily verify the tests are correct.
# xpermutations from http://code.activestate.com/recipes/190465
from xpermutations import *
lengths=[-1,0,1,5,10,0,1000,'inf']
for c in xselections(lengths,3): # or xuniqueselections
print c
(-1,-1,-1);
(-1,-1,0);
(-1,-1,1);
(-1,-1,5);
(-1,-1,10);
(-1,-1,0);
(-1,-1,1000);
(-1,-1,inf);
(-1,0,-1);
(-1,0,0);
...

I think you can do this with the Row Test Attribute (available in MbUnit and later versions of NUnit) where you could specify several sets to populate one unit test.

While it's possible to create lots of test data and see what happens, it's more efficient to try to minimize the data being used.
From a typical QA perspective, you would want to identify different classifications of inputs. Produce a set of input values for each classification and determine the appropriate outputs.
Here's a sample of classes of input values
valid triangles with small numbers such as (1 billion, 2, billion, 2 billion)
valid triangles with large numbers such as (0.000001, 0.00002, 0.00003)
valid obtuse triangles that are 'almost'flat such as (10, 10, 19.9999)
valid acute triangles that are 'almost' flat such as (10, 10, 0000001)
invalid triangles with at least one negative value
invalid triangles where the sum of two sides equals the third
invalid triangles where the sum of two sides is greater than the third
input values that are non-numeric
...
Once you are satisfied with the list of input classifications for this function, then you can create the actual test data. Likely, it would be helpful to test all permutations of each item. (e.g. (2,3,4), (2,4,3), (3,2,4), (3,4,2), (4,2,3), (4,3,2)) Typically, you'll find there are some classifications you missed (such as the concept of inf as an input parameter).
Random data for some period of time may be helpful as well, that can find strange bugs in the code, but is generally not productive.
More likely, this function is being used in some specific context where additional rules are applied.(e.g. only integer values or values must be in 0.01 increments, etc.) These add to the list of classifications of input parameters.

Related

Combination of elements of lists that meet some condition?

Given:
a = [5, 2, 8, 3, 9]
b = [3, 5, 7, 6, 8]
c = [8, 5, 7, 4, 9].
What is needed:
d = [(9, 8), (8, 7), ..., (5, 5, 5), (5, 6, 5), (5, 6, 7), ..., (8, 7, 7), (9, 8, 9), ...].
Description:
(1) In the above example, there are three lists a, b, c having integer elements and the output is another list d of tuples.
(2) The tuples in d have elements belonging to (a and b and c) or (a and b) or (b and c) such that difference between elements within any tuple is not greater than 1.
(3) Problem: How to find the complete list d where we take any element from any input list and find the difference less than or equal to 1. Generalize to more than just three input list: a, b, c, d, e, ... and each one is having ~ 1000 elements. I also need to retrieve the indices relative to the input lists/ arrays that form the tuples.
(4) Clarification: (a) All such tuples which contain entries not differing by more than 1 are allowed.
(b) Tuples must have elements that are close to at least one other element by not more than 1.
(c) Entries within a tuple must belong to different input arrays/ lists.
Let me know if there are further clarifications needed!
You can use sorting to find results faster than a naive brute-force. That being said, this assumes the number of output tuple is reasonably small. Otherwise, there is no way to find a solution in a reasonable time (eg. several months). As #mosway pointed out in the comments, the number of combinations can be insanely huge since the complexity is O(N ** M) (ie. exponential) where N is the number of list and M is the length of the lists.
The idea is to use np.unique on all lists so to get many sorted arrays with unique items. Then, you can iterate over the first array, and for each number (in the first array), find the range of values in the second one fitting in [n-1;n+1] using a np.searchsorted. You can then iterate over the filtered values of the second array and recursively do that on other array.
Note that regarding which array is chosen first, the method can be significantly faster. Thus, a good heuristic could be to select an array containing values very distant from others. Computing a distance matrix with all the values of all array and selecting the one having the biggest average distance should help.
Note also that using Numba should significantly speed up the recursive calls.

How would I compare a list (or equivalent) to another list in c++

I am attempting to learn C++ from scratch and possess a medium amount of python knowledge.
Here is some of my python code which takes a number, turns it into a list and checks if it contains all digits 0-9. If so it returns True, if not it returns False.
def val_checker(n):
values = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
lst = []
for i in range(len(str(n))):
lst.append((n // 10 ** i) % 10)
lst = lst[::-1]
return all(i in lst for i in values)
How would I achieve a similar thing in C++?
You would use the standard library container std::set or better yet std::unordered_set
This container will hold at most one of each distinct element, duplicates insertions are ignored.
So you can run through your original number in a loop, adding each digit into the set, and consider success if s.size() == 10 if your set is called s

Python3 how to create a list of partial products

I have a very long list (of big numbers), let's say for example:
a=[4,6,7,2,8,2]
I need to get this output:
b=[4,24,168,336,2688,5376]
where each b[i]=a[0]*a[1]...*a[i]
I'm trying to do this recursively in this way:
b=[4] + [ a[i-1]*a[i] for i in range(1,6)]
but the (wrong) result is: [4, 24, 42, 14, 16, 16]
I don't want to compute all the products each time, I need a efficient way (if possible), because the list is very long
At the moment this works for me:
b=[0]*6
b[0]=4
for i in range(1,6): b[i]=a[i]*b[i-1]
but it's too slow. Any ideas? Is it possible to avoid "for" or to speedup it in other way?
You can calculate the product step-by-step since every next calculation heavily depends on the previous one.
What I mean is:
1) Compute the product for the first i - 1 numbers
2) The i-th product will be equal to a[i] * product of the last i - 1 numbers
This method is called dynamic programming
Dynamic programming (also known as dynamic optimization) is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions
This is the implementation:
a = [4, 6, 7, 2, 8, 2]
b = []
product_so_far = 1
for i in range(len(a)):
product_so_far *= a[i]
b.append(product_so_far)
print(b)
This algorithm works in linear time (O(n)), which is the most efficient complexity you'll get for such a task
If you want a little optimization, you could generate the b list to the predefined length (b = [0] * len(a)) and, instead of appending, you would do this in a loop:
b[i] = product_so_far

Permutations including all members in Prolog

Extending on this question Can be this lisp function be implemented recursively? , is it possible to have all combinations in Prolog if the lists are represented like following:
items1(1, 2, 3).
items2(4, 5).
I can get the answer with following format of data:
items1(1).
items1(2).
items1(3).
items2(4).
items2(5).
?- findall((A,B), (items1(A), items2(B)), L).
L = [ (1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)].
I checked the permutation predicate from http://www.swi-prolog.org/pldoc/man?predicate=permutation/2 but that is for a different purpose.
You seem to have already solved your problem quite nicely, by normalizing your database. It is inconvenient to represent a list [I_1, I_2, ..., I_n] as a fact items(I_1, I_2, ..., I_n).. Lists are meant for (possibly ordered) collections that have 0 or more elements; facts are usually meant for tables with a preset number of columns, with one fact per row. And these two representations:
[a, b, c] (a list), and
item(a). item(b). item(c). (a table of facts)
are in fact very similar. Choosing one or another is a matter of how you want to use it, and it is quite easy (and common) to convert from the one to another in your program. For example, ?- member(X, [a, b, c]). is about the same as ?- item(X). when you have (2) defined. The Prolog list version of the question you linked would be:
combo(L1, L2, X-Y) :-
member(X, L1),
member(Y, L2).
And then:
?- combo([1,2,3], [4,5], R).
R = 1-4 ;
R = 1-5 ;
R = 2-4 ;
R = 2-5 ;
R = 3-4 ;
R = 3-5.
A bit less inconvenient (but still unnecessary) would be to write items([a, b, c])., so you have one argument, a list of all items. Then, you can do:
?- items(Is),
member(X, Is).
If you really have something like items(a, b, c)., and you know that you have exactly three items, you could still do:
?- items(A, B, C),
member(X, [A, B, C]).
If you don't know at compile time the arity of items, you need to examine the program at run time. If your program looks as in your question:
items1(1, 2, 3).
items2(4, 5).
Then you could for example do:
?- current_predicate(items1/N), % figure out the arity of items1
length(Is, N), % make a list of that length
Fact =.. [items1|Is], % create a callable term
Fact, % call the term to instantiate Is
member(X, Is). % enumerate the list
As you see, quite round-about.
One more comment: It is unusual to use (A,B) as a "tuple" in Prolog. For a pair, you would usually write A-B, and if you don't know how many elements you have, it's usually just a list, for example [A,B|Rest]. See this answer and the comments below it for more detail.

C++ multidimensional arrays

i was thinkg about writing a code that creates a pascal triangle. I 've done it but then i thought about doing it better. One idea came to my mind but i couldnt find a proper answer for it. Is it possible to create an array which will be look like that?
[1]|[1][1]|[1][2][1]|[1][3][3][1]|[1][4][6][4][1]| and so on? so my [1] would be (0,0) and [1][2][1] would be elements of cells(2,0),(2,1),(2,2). I would be grateful for any advise.
You can implement triangle array through a single-dimension array. Fixed-size array may look like this:
template<typename T, size_t N>
struct TriangleArray {
T& element(size_t i, size_t j)
{
if (i >= N || j >= N || i < j)
throw std::out_of_range("incorrect index");
return container[(i + 1) * i / 2 + j];
}
private:
T container[(N + 1) * N / 2];
};
No it's not possible. In an array, all the element must have the same type. Two dimensional arrays are arrays of arrays. That means that for a multidimensional array, all the line must have the same length. You should probably use a
std::vector<std::vector<int> >
here. Or a one dimensional array and and the logic to compute the 1 dim position from the 2 dim index:
index = row*(row+1)/2 + column.
See iterate matrix without nested loop if you want the reverse indexing.
Edit: fixed my formula which was off by one. Here is a check in Python:
The following index function takes row, col and compute the corresponding index in a one dimensional array using my formula:
>>> index = lambda row, col: row*(row+1)/2 + col
Here are the coordinate pairs
>>> [[(i,j) for j in range(i+1)] for i in range(5)]
[[(0, 0)],
[(1, 0), (1, 1)],
[(2, 0), (2, 1), (2, 2)],
[(3, 0), (3, 1), (3, 2), (3, 3)],
[(4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]]
I'm now checking that the corresponding index are the sequence of integer starting from 0 (indentation of the printing is mine):
>>> [[index(i,j) for j in range(i+1)] for i in range(5)]
[[0],
[1, 2],
[3, 4, 5],
[6, 7, 8, 9],
[10, 11, 12, 13, 14]]
The nicest thing would be to wrap the whole thing in a class called PascalTriangle and implement it along the following lines:
class PascalTriangle
{
private:
std::vector<std::vector<int> > m_data;
std::vector<int> CalculateRow(int row_index) const
{
// left as an exercise :)
}
public:
PascalTriangle(int num_rows) :
m_data()
{
assert(num_rows >= 0);
for (int row_index = 0; row_index < num_rows; ++row_index)
{
m_data.push_back(CalculateRow(row_index));
}
}
int operator()(int row_index, int column_index) const
{
assert(row_index >= 0 && row_index < m_data.size());
assert(column_index >= 0 && column_index < row_index);
return m_data[row_index][column_index];
}
};
Now here comes the catch: this approach allows you to perform lazy evaluation. Consider the following case: you might not always need each and every value. For example, you may only be interested in the 5th row. Then why store the other, unused values?
Based on this idea, here's an advanced version of the previous class:
class PascalTriangle
{
private:
int m_num_rows;
std::vector<int> CalculateRow(int row_index) const
{
// left as an exercise :)
}
public:
PascalTriangle(int num_rows) :
m_num_rows(num_rows)
{
assert(num_rows >= 0);
// nothing is done here!
}
int operator()(int row_index, int column_index) const
{
assert(row_index >= 0 && row_index < m_num_rows);
assert(column_index >= 0 && column_index < row_index);
return CalculateRow(row_index)[column_index];
}
};
Notice that the public interface of the class remains exactly the same, yet its internals are completely different. Such are the advantages of proper encapsulation. You effectively centralise error handling and optimisation points.
I hope these ideas inspire you to think more about the operations you want to perform with your Pascal triangle, because they will dictate the most appropriate data structure.
Edit: by request, here are some more explanations:
In the first version, m_data is a vector of vectors. Each contained std::vector<int> represents a row in the triangle.
The operator() function is a syntactical helper, allowing you to access PascalTriangle objects like this:
PascalTriangle my_triangle(10);
int i = my_triangle(3, 2);
assert makes sure that your code does not operate on illegal values, e.g. a negative row count or a row index greater than the triangle. But this is just one possible error reporting mechanism. You could also use exceptions, or error return values, or the Fallible idiom (std::optional). See past Stackoverflow questions for which error reporting mechanism to use when. This is a pure software-engineering aspect and has nothing to do with maths, but as you can imagine, it's, well, very important in software :)
CalculateRow returns a std::vector<int> representing the row specified by row_index. To implement it correctly, you'll need some maths. This is what I just found on Google: http://www.mathsisfun.com/pascals-triangle.html
In order to apply the maths, you'll want to know how to calculate n! in C++. There have been a lot of past Stackoverflow questions on this, for example here: Calculating large factorials in C++
Note that with the class approach, you can easily switch to another implementation later on. (You can even take it to the extreme and switch to a specific calculation algorithm based on the triangle height, without the users of the class ever noticing anything! See how powerful proper encapsulation can be?)
In the second version of the class, there is no permanent data storage anymore. CalculateRow is called only if and when needed, but the client of the class doesn't know this. As an additional possibly performance-improving measure, you could remember rows which you already calculated, for example by adding a private std::map<int, std::vector<int> > member variable whose int key represents the row index and whose values the rows. Every CalculateRow call would then first look if the result is already there, and add calculated ones at the end:
private mutable std::map<int, std::vector<int> > m_cache;
std::vector<int> CalculateRow(int row_index) const
{
// find the element at row_index:
std::map<int, std::vector<int> >::const_iterator cache_iter =
m_cache.find(row_index);
// is it there?
if (cache_iter != m_cache.end())
{
// return its value, no need to calculate it again:
return cache_iter->second;
}
// actual calculation of result left as an exercise :)
m_cache[row_index] = result;
return result;
}
By the way, this would also be a nice application of the new C++11 auto keyword. For example, you'd then just write auto cache_iter = m_cache.find(row_index);
And here's for another edit: I made m_cache mutable, because otherwise the thing wouldn't compile, as CalculateRow is a const member function (i.e. shouldn't change an object of the class from the client's point of view). This is a typical idiom for cache member variables.