Should I unit test *what* or *how* for composite functions? - unit-testing

I have functions f and g, which already have a collection of unit tests ensuring that their behavior is correct for some known inputs-output pairs (plus exception handling, etc).
Now I'm creating the function h(), as follows:
def h(x):
return f(x) + g(2*x)
What's a good approach for unit testing this function? Would this change if h() was substantially more complex?
My thoughts so far:
I see two possibilities for testing h().
1. Testing if h() is doing the correct "plumbing"
Mock out f() so that it returns y_0 when called with input x_0; mock g() so that it returns z_0 when called with input 2*x_0. Then check that calling h(x_0) returns y_0+z_0.
Advantages:
Very simple to implement the test.
Can quickly find bugs where I incorrectly connected the outputs of f and g, or where I called them with wrong arguments (say, calling g(x) instead of g(2*x) in h()).
Disadvantages:
This is testing how, not what h() should do. If later I want to refactor h(), then I'll probably need to rewrite these types of tests.
If the plumbing specified by the test does not produce the intended high-level behavior for h(), then these tests won't catch this error. For example, maybe the correct plumbing was supposed to be f(-x) + g(2*x), and I made it wrong both in the function definition and in the test definition.
2. Testing what h() should do
Let's say that the purpose of h() is to compute the sum of primes below the given argument. In this case, a natural test suite for h() would involve testing it with known input-output pairs. Something that makes sure that, for instance, h(1)=2, h(2)=5, h(5)=28, etc, not caring about how h() is computing these numbers.
Advantages:
This type of test checks that h() is indeed following its intended high-level behavior. Any plumbing mistakes that alter this will be caught.
Refactoring h() will probably not necessitate changing the test suite, and will even be easier, since the tests help us guarantee that the behavior of the function doesn't change.
Disadvantages:
In this simple example it's easy to produce such pairs because the mapping that h() performs is not very complicated (just sum the n first primes). However, for a very complicated h(), my only option for producing such pairs might be me to come up with an input x and compute the correct output by hand. This doesn't seem reasonable if h is very complicated.
Since coming up with known input-output pairs requires me to compute what f() and g() will produce given a certain input, there will probably be some duplication of effort, since I already spent some time doing that when creating the unit tests for these functions.
Related question: Unit testing composite functions.
This question is at first glance very similar to mine. However, the two most voted answers present completely different approaches to the problem (the two approaches I mentioned above). My question is an attempt in clarifying the pros/cons of each approach (and perhaps learn other approaches), and potentially establish which is best overall. If no approach is best in all cases, I would like to understand in which cases one should use each of them.

The thing is, you don't want to duplicate your test logic.
You might not want to include f and g logic inside your test for h.
In this case, mocking is well suited because it will allow you to only test h.
If f changes and has regression, then tests for h won't fail because its logic is still valid.
The problem of doing this is that you add more layers of (unit) tests.
The benefit however is that you completely isolate your tests and the logic is only tested where it should be.
As for everything, there is a balance to find. But if your logic is complex and include multiples collaborators (like f and g), then mocking can reduce the test complexity.
If you're familiar with TDD, it's directly related to Mockist vs Classicist.
If you have time, I suggest you to look at these videos about Outside-In TDD. You'll learn a lot.

It depends on how you interpret h:
Is it something that represents the combination of f and g, regardless of how f and g may behave in the future? Then you test the plumbing.
Is it something that produces a value, which at the moment you compute using f and g but could use an alternate implementation? Then test the output against the expected value given the input.

Related

How to decide granularity when it comes to test cases?

I have written a permissions middleware for a python/django API. Now the permissions depend on the combination of 4 different attributes let's call them a b c and d
Further c and d can be 1) Empty 2) Singular or 3) Multiple values so the combinations even increase further.
Initially I decided to write a single test case which would generate the combinations and map the expected outcome using truth tables.
# Generate all possible combinations
possible_combinations = list(itertools.product([0, 1], repeat=len(possible_values)))
for combination in possible_combinations:
# Get expected result for this combination
expected_result = get_expected_result(
test_type=test_type, combination=combination, possible_values=possible_values)
Based on the combination value, set as 0 or 1 I decided the values of a b c and d
Now I am doing this as part of a new team and the team members came back with a criticism of the approach as it is not granular and doesn't follow best practices of having a test case do one thing.
My view on best practices is that they should very clearly and obviously serve a purposes and the purpose can depend on the task or team at hand. In this particular case the team members failed to suggest why writing this particular test case into granular parts would help.
Ultimately I ended up writing up to 30 test cases with 80% redundant code.
Is there any objectivity to this debate or is this a very subjective call?
It all really depends on how you look at it. Sometimes it may make sense to test for more than one thing in a single unit test if they're too tightly coupled, but other times, it makes sense to test just one scenario.
Unit tests should be very explicit and their purpose should be clear from the name. The approach you're using will turn that test method into a sort of blackbox and it will be difficult to track what actually is wrong if it fails. In the case of single tests for each, you can easily understand where the bug is, if one test fails, but in your case, one would need some debugging to figure that out. These are some of the advantages of keeping them as granular as possible

Test Driven Development Understanding Problems

Maybe somebody can help me understanding the "Test Driven Development" Method. I tried the following example by myself and i dont know where my understanding problem is.
Assume that we need a function that gives back the sum of two numbers a and b
To ensure, that the function works right, i write several tests. Like creating the sum-object, checking if a and b are numbers and so on .. but the first "real test" of right calculating is the following
a=3
b=3
expected value: 6
The TDD method allows us only to do so many steps to let the test pass.
So the function looks like
sum(a, b){
return 6
}
The Test "3+3" will pass.
Next test is "4+10" maybe.
I'll run the tests and the last test will fail. What a surprise ...
I'll change my function to
sum(a, b){
if(a=3 and b=3)
return 6
else
return 14
}
The test will pass!
And this goes so on and on ... i will only add another cases for every test. The function will pass every of this tests, but for every other not listed case it will not and the result is an ineffective and stupid written function.
So is there a foolproof "trick" to not fall into this way of thinking?
I thought, test driven development is pretty straight forward and dumb proof. Where is the "break even" point when its time to say, that this way of doing tests isn't practicable anymore and switch to the right solution
return a+b;
???
This is a very simple example, but i could imagine, that there are more complex functions which are obviously not so easy to correct like this one.
Thanks
The TDD workflow has a 3-part cycle ("red,green,refactor") and it's important not to skip the third part. For example, after your second version:
sum(a, b){
if(a=3 and b=3)
return 6
else
return 14
}
You should look at this and ask: is there a simpler way to write this? Well, yes, there is:
sum(a, b){
return a+b
}
Of course, this is an unrealistic trivial example, but in real-life coding, this third step will guide you to refine your code into a well-written, tested final version.
The basic idea of writing test is to know whenever your system is behaving as expected or not. In test we make expectations, assumptions. Basically, we make following
Set your expectations
Run the code
Check expectations against the actual output
We set our expectations for given conditions and test it against the actual output. As developer, product owner, we always know how the system should behave for any given condition and we write tests accordingly.
For example, for the below given pseudo code:
int sum(int a, int b) {
return a + b;
}
Here method sum should return the sum of arguments a and b. We know that,
The argument should always be integer.
The output should always be integer type.
The output should be the sum of two numbers a, b.
So, we exactly know when it would fail and we should write test to cover at least 70% of those cases.
I am a PHP guy, so my examples are in PHP. Regarding, ways to supply the arguments a, b. we have something called data provider. I am giving PHP here as a reference, in PhpUnit the preferred way of passing different argument is to pass it through Dataprovider. Visit the dataprovider sample and you will see the example for additions.
And this goes so on and on ... i will only add another cases for every test. The function will pass every of this tests, but for every other not listed case it will not and the result is an ineffective and stupid written function.
Yes, we try to cover as much part of the cases as possible. The more test covered, the more confident we become on our code. Let's say we have written a method that returns the subsets of array each having 4 unique elements in it. Now how do you approach writing the test cases for it? One of the solution would be to compute the permutation and check the length of array that should not exceed maximum count of array (being each unique element).
Where is the "break even" point when its time to say, that this way of doing tests isn't practicable anymore and switch to the right solution
We don't have break even in test cases. But we make the choices among different types of test cases namely (unit tests, functional bests, behavioural test). It is upto the developer what type of tests should be implemented and depending upon the types of tests it may vary.
The best way is to implement the TDD in projects. Until we do it in real projects, the confusion would remain. I myself had very hard time getting to understand the Mock and Expectations. It's not something that can be learned overnight, so if you don't understand something it's normal. Try it yourself, give yourself sometime, do experiments ask with friends just don't get exhausted. Always be curious.
Let us know if you still have confusions on it.

Unit testing with random data

I've read that generating random data in unit tests is generally a bad idea (and I do understand why), but testing on random data and then constructing a fixed unit test case from random tests which uncovered bugs seems nice. However I don't understand how to organize it nicely. My question is not related to a specific programming language or to a specific unit test framework actually, so I'll use python and some pseudo unit test framework. Here's how I see coding it:
def random_test_cases():
datasets = [
dataset1,
dataset2,
...
datasetn
]
for dataset in datasets:
assertTrue(...)
assertEquals(...)
assertRaises(...)
# and so on
The problem is: when this test case fails I can't figure out which dataset caused failure. I see two ways of solving it:
Create a single test case per dataset — the problem is load of test cases and code duplication.
Usually test framework lets us pass a message to assert functions (in my example I could do something like assertTrue(..., message = str(dataset))). The problem is that I should pass such a message to each assert, which does not look like elegant too.
Is there a simpler way of doing it?
I still think it's a bad idea.
Unit tests need to be straightforward. Given the same piece of code and the same unit test, you should be able to run it infinitely and never get a different response unless there's an external factor coming in to play. A goal contrary to this will increase maintenance cost of your automation, which defeats the purpose.
Outside of the maintenance aspect, to me it seems lazy. If you put thought in to your functionality and understand the positive as well as the negative test cases, developing unit tests are straightforward.
I also disagree with the user who shows how to do multiple tests cases inside of the same test case. When a test fails, you should be able to tell immediately which test failed and know why it failed. Tests should be as simple as you can make them and as concise/relevant to the code under test as possible.
You could define tests by extension instead of enumeration, or you could call multiple test cases from a single case.
calling multiple test cases from a single test case:
MyTest()
{
MyTest(1, "A")
MyTest(1, "B")
MyTest(2, "A")
MyTest(2, "B")
MyTest(3, "A")
MyTest(3, "B")
}
And there are sometimes elegant ways to achieve this with some testing frameworks. Here is how to do it in NUnit:
[Test, Combinatorial]
public void MyTest(
[Values(1,2,3)] int x,
[Values("A","B")] string s)
{
...
}
I also think it's a bad idea.
Mind you, not throwing random data at your code, but having unit tests doing that. It all boils down to why you unit test in the first place. The answer is "to drive the design of the code". Random data doesn't drive the design of the code, because it depends on a very rigid public interface. Mind you, you can find bugs with it, but that's not what unit tests are about. And let me note that I'm talking about unit tests, and not tests in general.
That being said, I strongly suggest taking a look at QuickCheck. It's Haskell, so it's a bit dodgy on presentation and a bit PhD-ish on documentation, but you should be able to figure it out. I'm going to summarize how it works, though.
After you pick the code you want to test (let's say the sort() function), you establish invariants which should hold. In this examples, you can have the following invariants if result = sort(input):.
Every element in result should be smaller than or equal to the next one.
Every element in input should be present in result the same number of times.
result and input should have the same length (this is repeats the previous, but let's have it for illustration).
You encode each variant in a simple function that takes the result and the output and checks whether those invariants code.
Then, you tell QuickCheck how to generate input. Since this is Haskell and the type system kicks ass, it can see that the function takes a list of integers and it knows how to generate those. It basically generates random lists of random integers and random length. Of course, it can be more fine-grained if you have a more complex data type (for example, only positive integers, only squares, etc.).
Finally, when you have those two, you just run QuickCheck. It generates all that stuff randomly and checks the invariants. If some fail, it will show you exactly which ones. It would also tell you the random seed, so you can rerun this exact failure if you need to. And as an extra bonus, whenever it gets a failed invariant, it will try to reduce the input to the smallest possible subset that fails the invariant (if you think of a tree structure, it will reduce it to the smallest subtree that fails the invariant).
And there you have it. In my opinion, this is how you should go about testing stuff with random data. It's definitely not unit tests and I even think you should run it differently (say, have CI run it every now and then, as opposed to running it on every change (since it will quickly get slow)). And let me repeat, it's a different benefit from unit testing - QuickCheck finds bugs, while unit testing drives design.
Usually the unit test frameworks support 'informative failures' as long as you pick the right assertion method.
However if everything else doesn't work, You could easily trace the dataset to the console/output file. Low tech but should work.
[TestCaseSource("GetDatasets")]
public Test.. (Dataset d)
{
Console.WriteLine(PrettyPrintDataset(d));
// proceed with checks
Console.WriteLine("Worked!");
}
In quickcheck for R we tried to solve this problem as follows
the tests are actually pseudo-random (the seed is fixed) so you can always reproduce your tests results (barring external factors, of course)
the test function returns enough data to reproduce the error, including the assertion that failed and the data that made it fail. A convenience function, repro, called on the return value of test will land you in the debugger at the beginning of the failing assertion, with arguments set to the witnesses of the failure. If the tests are executed in batch mode, equivalent information is stored in a file and the command to retrieve it is printed in stderr. Then you can call repro as before. Whether or not you program in R, I would love to know if this starts to address you requirements. Some aspects of this solution may be hard to implement in languages that are less dynamic or don't have first class functions.

how do I avoid re-implementing the code being tested when I write tests?

(I'm using rspec in RoR, but I believe this question is relevant to any testing system.)
I often find myself doing this kind of thing in my test suite:
actual_value = object_being_tested.tested_method(args)
expected_value = compute_expected_value(args)
assert(actual_value == expected_value, ...)
The problem is that my implementation of compute_expected_value() often ends up mimicking object_being_tested.tested_method(), so it's really not a good test because they may have identical bugs.
This is a rather open-ended question, but what techniques do people use to avoid this trap? (Points awarded for pointers to good treatises on the topic...)
Usually (for manually written unit tests) you would not compute the expected value. Rather, you would just assert against what you expect to be the result from the tested method for the given args. That is, you would have something like this:
actual_value = object_being_tested.tested_method(args)
expected_value = what_you_expect_to_be_the_result
assert(actual_value == expected_value, ...)
In other testing scenarios where the arguments (or even test methods being executed) are generated automatically, you need to devise a simple oracle which will give you the expected result (or an invariant that should hold for the expected result). Tools like Pex, Randoop, ASTGen, and UDITA enable such testing.
Well here are my two cents
a) if the calculation of the expected value is simple and does not encompass any business rules/conditions in there apart from the test case to which it is generating the expected result then it should be good enough... remember your actual code will be as generic as possible.
Well there are cases where you will run into issues in the expected method but you can easily pin point the cos of failure and fix it.
b) there are cases when the expected value cannot be easily calculated in that case probably have flat files with results or probably some kind of constant expected value as naturally you would want that.
Also then there are tests where in you just want to verify whether a particular method was called or not and you are done testing that unit.. remember to use all these different paradigms while testing and always remember KEEP IT SIMPLE
you would not do that.
you do not compute the expected value, you know it already. it should be a constant value defined in your test. (or is constructed from other functions that have already been tested.)

Automated testing feels a lot like duplicating the tested logic, am I doing it right?

I'm implementing automated testing with CppUTest in C++.
I realize I end up almost copying and pasting the logic to be tested on the tests themselves, so I can check the expected outcomes.
Am I doing it right? should it be otherwise?
edit: I'll try to explain better:
The unit being tested takes input A, makes some processing and returns output B
So apart from making some black box checks, like checking that the output lies in an expectable range, I would also like to see if the output B that I got is the right outcome for input A I.E. if the logic is working as expected.
So for example if the unit just makes A times 2 to yield B, then in the test I have no other way of checking than making again the calculation of A times 2 to check against B to be sure it went alright.
That's the duplication I'm talking about.
// Actual function being tested:
int times2( int a )
{
return a * 2;
}
.
// Test:
int test_a;
int expected_b = test_a * 2; // here I'm duplicating times2()'s logic
int actual_b = times2( test_a );
CHECK( actual_b == expected_b );
.
PS: I think I will reformulate this in another question with my actual source code.
If your goal is to build automated tests for your existing code, you're probably doing it wrong. Hopefully you know what the result of frobozz.Gonkulate() should be for various inputs and can write tests to check that Gonkulate() is returning the right thing. If you have to copy Gonkulate()'s convoluted logic to figure out the answer, you might want to ask yourself how well you understand the logic to begin with.
If you're trying to do test-driven development, you're definitely doing it wrong. TDD consists of many quick cycles of:
Writing a test
Watching it fail
Making it pass
Refactoring as necessary to improve the overall design
Step 1 - writing the test first - is an essential part of TDD. I infer from your question that you're writing the code first and the tests later.
So for example if the unit just makes A times 2 to yield B, then in
the test I have no other way of checking than making again the
calculation of A times 2 to check against B to be sure it went
alright.
Yes you do! You know how to calculate A times two, so you don't need to do this in code. if A is 4 then you know the answer is 8. So you can just use it as the expected value.
CHECK( actual_b == 8 )
if you are worried about magic numbers, don't be. Nobody will be confused about the meaning of the hard coded numbers in the following line:
CHECK( times_2(4) == 8 )
If you don't know what the result should be then your unit test is useless. If you need to calculate the expected result, then you are either using the same logic as the function, or using an alternate algorithm to work out the result.In the first case, if the logic that you duplicate is incorrect, your test will still pass! In the second case, you are introducing another place for a bug to occur. If a test fails, you will need to work out whether it failed because the function under test has a bug, or if your test method has a bug.
I think this one is a though to crack because it's essentially a mentality shift. It was somewhat hard for me.
The thing about tests is to have your expectancies nailed down and check if your code really does what you think it does. Think in ways of exercising it, not checking its logic so directly, but as a whole. If that's too hard, maybe your function/method just does too much.
Try to think of your tests as working examples of what your code can do, not as a mathematical proof.
The programming language shouldn't matter.
var ANY_NUMBER = 4;
Assert.That(times_2(ANY_NUMBER), Is.EqualTo(ANY_NUMBER*2)
In this case, I wouldn't mind duplicating the logic. The expected value is readable as compared to 8. Second this logic doesn't look like a change-magnet. Relatively static.
For cases, where the logic is more involved (chunky) and prone to change, duplicating the logic in the test is definitely not recommended. Duplication is evil. Any change to the logic would ripple changes to the test. In that case, I'd use hardcoded input-expected output pairs with some readable pair-names.