Suppose I have the following function (pseudocode):
bool checkObjects(a, b)
{
if ((a.isValid() && (a.hasValue()) ||
(b.isValid() && (b.hasValue()))
{
return true;
}
return false;
}
Which tests should I write to be able to claim that it's 100% covered?
There are total 16 possible input combinations. Should I write 16 test cases, or should I try to act smart and omit some test cases?
For example, should I write test for
[a valid and has value, b valid and has value]
if I tested that it returns what expected for
[a valid and has value, b invalid and has value]
and
[a invalid and has value, b valid and has value]
?
Thanks!
P.S.: Maybe someone can suggest good reading on unit testing approaches?
Test Driven Development by Kent Beck is well-done and is becoming a classic (http://www.amazon.com/Test-Driven-Development-Kent-Beck/dp/0321146530)
If you wanted to be thorough to the max then yes 16 checks would be worthwhile.
It depends. Speaking personally, I'd be satisfied to test all boundary conditions. So both cases where it is true, but making one item false would make the overall result false, and all 4 false cases where making one item true would make the overall result true. But that is a judgment call, and I wouldn't fault someone who did all 16 cases.
Incidentally if you unit tested one true case and one false one, code coverage tools would say that you have 100% coverage.
If you are woried about writing 16 test cases, you can try some features like NUnit TestCase or MbUnit RowTest. Other languages/frameworks should have similar features.
This would allow you to test all 16 conditions with a single (and small) test case).
If testing seems hard, think about refactoring. I can see several approaches here. First merge isValid() and hasValue() into one method and test it separately. And why have checkObjects(a, b) testing two unrelated objects? Why can't you have checkObject(a) and checkObject(b), decreasing the exponential growth of possibilities further? Just a hint.
If you really want to test all 16 possibilities, consider some more table-ish tools, like Fitnesse (see http://fitnesse.org/FitNesse.UserGuide.FitTableStyles). Also check Parameterized JUnit runner and TestNG.
Related
Maybe somebody can help me understanding the "Test Driven Development" Method. I tried the following example by myself and i dont know where my understanding problem is.
Assume that we need a function that gives back the sum of two numbers a and b
To ensure, that the function works right, i write several tests. Like creating the sum-object, checking if a and b are numbers and so on .. but the first "real test" of right calculating is the following
a=3
b=3
expected value: 6
The TDD method allows us only to do so many steps to let the test pass.
So the function looks like
sum(a, b){
return 6
}
The Test "3+3" will pass.
Next test is "4+10" maybe.
I'll run the tests and the last test will fail. What a surprise ...
I'll change my function to
sum(a, b){
if(a=3 and b=3)
return 6
else
return 14
}
The test will pass!
And this goes so on and on ... i will only add another cases for every test. The function will pass every of this tests, but for every other not listed case it will not and the result is an ineffective and stupid written function.
So is there a foolproof "trick" to not fall into this way of thinking?
I thought, test driven development is pretty straight forward and dumb proof. Where is the "break even" point when its time to say, that this way of doing tests isn't practicable anymore and switch to the right solution
return a+b;
???
This is a very simple example, but i could imagine, that there are more complex functions which are obviously not so easy to correct like this one.
Thanks
The TDD workflow has a 3-part cycle ("red,green,refactor") and it's important not to skip the third part. For example, after your second version:
sum(a, b){
if(a=3 and b=3)
return 6
else
return 14
}
You should look at this and ask: is there a simpler way to write this? Well, yes, there is:
sum(a, b){
return a+b
}
Of course, this is an unrealistic trivial example, but in real-life coding, this third step will guide you to refine your code into a well-written, tested final version.
The basic idea of writing test is to know whenever your system is behaving as expected or not. In test we make expectations, assumptions. Basically, we make following
Set your expectations
Run the code
Check expectations against the actual output
We set our expectations for given conditions and test it against the actual output. As developer, product owner, we always know how the system should behave for any given condition and we write tests accordingly.
For example, for the below given pseudo code:
int sum(int a, int b) {
return a + b;
}
Here method sum should return the sum of arguments a and b. We know that,
The argument should always be integer.
The output should always be integer type.
The output should be the sum of two numbers a, b.
So, we exactly know when it would fail and we should write test to cover at least 70% of those cases.
I am a PHP guy, so my examples are in PHP. Regarding, ways to supply the arguments a, b. we have something called data provider. I am giving PHP here as a reference, in PhpUnit the preferred way of passing different argument is to pass it through Dataprovider. Visit the dataprovider sample and you will see the example for additions.
And this goes so on and on ... i will only add another cases for every test. The function will pass every of this tests, but for every other not listed case it will not and the result is an ineffective and stupid written function.
Yes, we try to cover as much part of the cases as possible. The more test covered, the more confident we become on our code. Let's say we have written a method that returns the subsets of array each having 4 unique elements in it. Now how do you approach writing the test cases for it? One of the solution would be to compute the permutation and check the length of array that should not exceed maximum count of array (being each unique element).
Where is the "break even" point when its time to say, that this way of doing tests isn't practicable anymore and switch to the right solution
We don't have break even in test cases. But we make the choices among different types of test cases namely (unit tests, functional bests, behavioural test). It is upto the developer what type of tests should be implemented and depending upon the types of tests it may vary.
The best way is to implement the TDD in projects. Until we do it in real projects, the confusion would remain. I myself had very hard time getting to understand the Mock and Expectations. It's not something that can be learned overnight, so if you don't understand something it's normal. Try it yourself, give yourself sometime, do experiments ask with friends just don't get exhausted. Always be curious.
Let us know if you still have confusions on it.
I'm trying to replace some old unit tests with property based testing (PBT), concreteley with scala and scalatest - scalacheck but I think the problem is more general. The simplified situation is , if I have a method I want to test:
def upcaseReverse(s:String) = s.toUpperCase.reverse
Normally, I would have written unit tests like:
assertEquals("GNIRTS", upcaseReverse("string"))
assertEquals("", upcaseReverse(""))
// ... corner cases I could think of
So, for each test, I write the output I expect, no problem. Now, with PBT, it'd be like :
property("strings are reversed and upper-cased") {
forAll { (s: String) =>
assert ( upcaseReverse(s) == ???) //this is the problem right here!
}
}
As I try to write a test that will be true for all String inputs, I find my self having to write the logic of the method again in the tests. In this case the test would look like :
assert ( upcaseReverse(s) == s.toUpperCase.reverse)
That is, I had to write the implementation in the test to make sure the output is correct.
Is there a way out of this? Am I misunderstanding PBT, and should I be testing other properties instead, like :
"strings should have the same length as the original"
"strings should contain all the characters of the original"
"strings should not contain lower case characters"
...
That is also plausible but sounds like much contrived and less clear. Can anybody with more experience in PBT shed some light here?
EDIT : following #Eric's sources I got to this post, and there's exactly an example of what I mean (at Applying the categories one more time): to test the method times in (F#):
type Dollar(amount:int) =
member val Amount = amount
member this.Add add =
Dollar (amount + add)
member this.Times multiplier =
Dollar (amount * multiplier)
static member Create amount =
Dollar amount
the author ends up writing a test that goes like:
let ``create then times should be same as times then create`` start multiplier =
let d0 = Dollar.Create start
let d1 = d0.Times(multiplier)
let d2 = Dollar.Create (start * multiplier) // This ones duplicates the code of Times!
d1 = d2
So, in order to test that a method, the code of the method is duplicated in the test. In this case something as trivial as multiplying, but I think it extrapolates to more complex cases.
This presentation gives some clues about the kind of properties you can write for your code without duplicating it.
In general it is useful to think about what happens when you compose the method you want to test with other methods on that class:
size
++
reverse
toUpperCase
contains
For example:
upcaseReverse(y) ++ upcaseReverse(x) == upcaseReverse(x ++ y)
Then think about what would break if the implementation was broken. Would the property fail if:
size was not preserved?
not all characters were uppercased?
the string was not properly reversed?
1. is actually implied by 3. and I think that the property above would break for 3. However it would not break for 2 (if there was no uppercasing at all for example). Can we enhance it? What about:
upcaseReverse(y) ++ x.reverse.toUpper == upcaseReverse(x ++ y)
I think this one is ok but don't believe me and run the tests!
Anyway I hope you get the idea:
compose with other methods
see if there are equalities which seem to hold (things like "round-tripping" or "idempotency" or "model-checking" in the presentation)
check if your property will break when the code is wrong
Note that 1. and 2. are implemented by a library named QuickSpec and 3. is "mutation testing".
Addendum
About your Edit: the Times operation is just a wrapper around * so there's not much to test. However in a more complex case you might want to check that the operation:
has a unit element
is associative
is commutative
is distributive with the addition
If any of these properties fails, this would be a big surprise. If you encode those properties as generic properties for any binary relation T x T -> T you should be able to reuse them very easily in all sorts of contexts (see the Scalaz Monoid "laws").
Coming back to your upperCaseReverse example I would actually write 2 separate properties:
"upperCaseReverse must uppercase the string" >> forAll { s: String =>
upperCaseReverse(s).forall(_.isUpper)
}
"upperCaseReverse reverses the string regardless of case" >> forAll { s: String =>
upperCaseReverse(s).toLowerCase === s.reverse.toLowerCase
}
This doesn't duplicate the code and states 2 different things which can break if your code is wrong.
In conclusion, I had the same question as you before and felt pretty frustrated about it but after a while I found more and more cases where I was not duplicating my code in properties, especially when I starting thinking about
combining the tested function with other functions (.isUpper in the first property)
comparing the tested function with a simpler "model" of computation ("reverse regardless of case" in the second property)
I have called this problem "convergent testing" but I can't figure out why or where there term comes from so take it with a grain of salt.
For any test you run the risk of the complexity of the test code approaching the complexity of the code under test.
In your case, the the code winds up being basically the same which is just writing the same code twice. Sometimes there is value in that. For example, if you are writing code to keep someone in intensive care alive, you could write it twice to be safe. I wouldn't fault you for the abundance of caution.
For other cases there comes a point where the likelihood of the test breaking invalidates the benefit of the test catching real issues. For that reason, even if it is against best practice in other ways (enumerating things that should be calculated, not writing DRY code) I try to write test code that is in some way simpler than the production code, so it is less likely to fail.
If I cannot find a way to write code simpler than the test code, that is also maintainable(read: "that I also like"), I move that test to a "higher" level(for example unit test -> functional test)
I just started playing with property based testing but from what I can tell it is hard to make it work with many unit tests. For complex units, it can work, but I find it more helpful at functional testing so far.
For functional testing you can often write the rule a function has to satisfy much more simply than you can write a function that satisfies the rule. This feels to me a lot like the P vs NP problem. Where you can write a program to VALIDATE a solution in linear time, but all known programs to FIND a solution take much longer. That seems like a wonderful case for property testing.
Why is it that every unit testing framework (that I know of) requires the expected value in equality tests to always be the first argument:
Assert.AreEqual(42, Util.GetAnswerToLifeTheUniverseAndEverything());
assertEquals(42, Util.GetAnswerToLifeTheUniverseAndEverything());
etc.
I'm quite used to it now, but every coder I try to teach unit testing makes the mistake of reversing the arguments, which I understand perfectly. Google didn't help, maybe one of the hard-core unit-testers here knows the answer?
It seems that most early frameworks used expected before actual (for some unknown reason though, dice roll perhaps?). Yet with programming languages development, and increased fluency of the code, that order got reversed. Most fluent interfaces usually try to mimic natural language and unit testing frameworks are no different.
In the assertion, we want to assure that some object matches some conditions. This is the natural language form, as if you were to explain your test code you'd probably say
"In this test, I make sure that computed value is equal to 5"
instead of
"In this test, I make sure that 5 is equal to computed value".
Difference may not be huge, but let's push it further. Consider this:
Assert.That(Roses, Are(Red));
Sounds about right. Now:
Assert.That(Red, Are(Roses));
Hm..? You probably wouldn't be too surprised if somebody told you that roses are red. Other way around, red are roses, raises suspicious questions. Yoda, anybody?
Yoda's making an important point - reversed order forces you to think.
It gets even more unnatural when your assertions are more complex:
Assert.That(Forest, Has.MoreThan(15, Trees));
How would you reverse that one? More than 15 trees are being had by forest?
This claim (fluency as a driving factor for modification) is somehow reflected in the change that NUnit has gone through - originally (Assert.AreEqual) it used expected before actual (old style). Fluent extensions (or to use NUnit's terminology, constraint based - Assert.That) reversed that order.
I think it is just a convention now and as you said it is adopted by "every unit testing framework (I know of)". If you are using a framework it would be annoying to switch to another framework that uses the opposite convention. So (if you are writing a new unit testing framework for example) it would be preferable for you as well to follow the existing convention.
I believe this comes from the way some developers prefer to write their equality tests:
if (4 == myVar)
To avoid any unwanted assignment, by mistake, writing one "=" instead of "==". In this case the compiler will catch this error and you will avoid a lot of troubles trying to fix a weird runtime bug.
Nobody knows and it is the source of never ending confusions. However not all frameworks follow this pattern (to a greater confusion):
FEST-Assert uses normal order:
assertThat(Util.GetAnswerToLifeTheUniverseAndEverything()).isEqualTo(42);
Hamcrest:
assertThat(Util.GetAnswerToLifeTheUniverseAndEverything(), equalTo(42))
ScalaTest doesn't really make a distinction:
Util.GetAnswerToLifeTheUniverseAndEverything() should equal (42)
I don't know but I've been part of several animated discussions about the order of arguments to equality tests in general.
There are a lot of people who think
if (42 == answer) {
doSomething();
}
is preferable to
if (answer == 42) {
doSomething();
}
in C-based languages. The reason for this is that if you accidentally put a single equals sign:
if (42 = answer) {
doSomething();
}
will give you a compiler error, but
if (answer = 42) {
doSomething();
}
might not, and would definitely introduce a bug that might be hard to track down. So who knows, maybe the person/people who set up the unit testing framework were used to thinking of equality tests in this way -- or they were copying other unit testing frameworks that were already set up this way.
I think it's because JUnit was the precursor of most unit testing frameworks (not that it was the first unit testing framework, but it kicked off an explosion in unit testing). Since JUnit did it that way, all the subsequent frameworks copied this form and it became a convention.
why did JUnit do it that way? I don't know, ask Kent Beck!
My view for this would be to avoid any exceptions eg: 42.equals(null) vs null.equals(42)
where 42 is expected
null is actual
Well they had to pick one convention. If you want to reverse it try the Hamcrest matchers. They are meant to help increase readability. Here is a basic sample:
import org.junit.Test;
import static org.junit.Assert.assertThat;
import static org.hamcrest.core.Is.is;
public HamcrestTest{
#Test
public void matcherShouldWork(){
assertThat( Math.pow( 2, 3 ), is( 8 ) );
}
}
Surely it makes logical sense to put the expected value first, as it's the first known value.
Think about it in the context of manual tests. A manual test will have the expected value written in, with the actual value recorded afterwards.
I'm implementing automated testing with CppUTest in C++.
I realize I end up almost copying and pasting the logic to be tested on the tests themselves, so I can check the expected outcomes.
Am I doing it right? should it be otherwise?
edit: I'll try to explain better:
The unit being tested takes input A, makes some processing and returns output B
So apart from making some black box checks, like checking that the output lies in an expectable range, I would also like to see if the output B that I got is the right outcome for input A I.E. if the logic is working as expected.
So for example if the unit just makes A times 2 to yield B, then in the test I have no other way of checking than making again the calculation of A times 2 to check against B to be sure it went alright.
That's the duplication I'm talking about.
// Actual function being tested:
int times2( int a )
{
return a * 2;
}
.
// Test:
int test_a;
int expected_b = test_a * 2; // here I'm duplicating times2()'s logic
int actual_b = times2( test_a );
CHECK( actual_b == expected_b );
.
PS: I think I will reformulate this in another question with my actual source code.
If your goal is to build automated tests for your existing code, you're probably doing it wrong. Hopefully you know what the result of frobozz.Gonkulate() should be for various inputs and can write tests to check that Gonkulate() is returning the right thing. If you have to copy Gonkulate()'s convoluted logic to figure out the answer, you might want to ask yourself how well you understand the logic to begin with.
If you're trying to do test-driven development, you're definitely doing it wrong. TDD consists of many quick cycles of:
Writing a test
Watching it fail
Making it pass
Refactoring as necessary to improve the overall design
Step 1 - writing the test first - is an essential part of TDD. I infer from your question that you're writing the code first and the tests later.
So for example if the unit just makes A times 2 to yield B, then in
the test I have no other way of checking than making again the
calculation of A times 2 to check against B to be sure it went
alright.
Yes you do! You know how to calculate A times two, so you don't need to do this in code. if A is 4 then you know the answer is 8. So you can just use it as the expected value.
CHECK( actual_b == 8 )
if you are worried about magic numbers, don't be. Nobody will be confused about the meaning of the hard coded numbers in the following line:
CHECK( times_2(4) == 8 )
If you don't know what the result should be then your unit test is useless. If you need to calculate the expected result, then you are either using the same logic as the function, or using an alternate algorithm to work out the result.In the first case, if the logic that you duplicate is incorrect, your test will still pass! In the second case, you are introducing another place for a bug to occur. If a test fails, you will need to work out whether it failed because the function under test has a bug, or if your test method has a bug.
I think this one is a though to crack because it's essentially a mentality shift. It was somewhat hard for me.
The thing about tests is to have your expectancies nailed down and check if your code really does what you think it does. Think in ways of exercising it, not checking its logic so directly, but as a whole. If that's too hard, maybe your function/method just does too much.
Try to think of your tests as working examples of what your code can do, not as a mathematical proof.
The programming language shouldn't matter.
var ANY_NUMBER = 4;
Assert.That(times_2(ANY_NUMBER), Is.EqualTo(ANY_NUMBER*2)
In this case, I wouldn't mind duplicating the logic. The expected value is readable as compared to 8. Second this logic doesn't look like a change-magnet. Relatively static.
For cases, where the logic is more involved (chunky) and prone to change, duplicating the logic in the test is definitely not recommended. Duplication is evil. Any change to the logic would ripple changes to the test. In that case, I'd use hardcoded input-expected output pairs with some readable pair-names.
Consider the following code (from a requirement that says that 3 is special for some reason):
bool IsSpecial(int value)
if (value == 3)
return true
else
return false
I would unit test this with a couple of functions - one called TEST(3IsSpecial) that asserts that when passed 3 the function returns true and another that passes some random value other than 3 and asserts that the function returns false.
When the requirement changes and say it now becomes 3 and 20 are special, I would write another test that verifies that when called with 20 this function returns true as well. That test would fail and I would then go and update the if condition in the function.
Now, what if there are people on my team who do not believe in unit testing and they make this change. They will directly go and change the code and since my second unit test might not test for 20 (it could be randomly picking an int or have some other int hardcoded). Now my tests aren't in sync with the code. How do I ensure that when they change the code some unit test or the other fails?
I could be doing something grossly wrong here so any other techniques to get around this are also welcome.
That's a good question. As you note a Not3IsNotSpecial test picking a random non-3 value would be the traditional approach. This wouldn't catch a change in the definition of "special".
In a .NET environment you can use the new code contracts capability to write the test predicate (the postcondition) directly in the method. The static analyzer would catch the defect you proposed. For example:
Contract.Ensures(value != 3 && Contract.Result<Boolean>() == false);
I think anybody that's a TDD fan is experimenting with contracts now to see use patterns. The idea that you have tools to prove correctness is very powerful. You can even specify these predicates for an interface.
The only testing approach I've seen that would address this is Model Based Testing. The idea is similar to the contracts approach. You set up the Not3IsNotSpecial condition abstractly (e.g., IsSpecial(x => x != 3) == false)) and let a model execution environment generate concrete tests. I'm not sure but I think these environments do static analysis as well. Anyway, you let the model execution environment run continuously against your SUT. I've never used such an environment, but the concept is interesting.
Unfortunately, that specific scenario is something that is difficult to guard against. With a function like IsSpecial, it's unrealistic to test all four billion negative test cases, so, no, you're not doing something grossly wrong.
Here's what comes to me off the top of my head. Many repositories have hooks that allow you to run some process on each check-in, such as running the unit tests. It's possible to set a criterion that newly checked in code must reach some threshold of code coverage under unit tests. If the commit does not meet certain metrics, it is rejected.
I've never had to set one of these systems up, so I don't know what is involved, but I do know it's possible.
And believe me, I feel your pain. I work with people who are similarly resistant to unit testing.
One thing you need to think about is why 3 is a special character and others are not. If it is defining some aspect of your application, you can take that aspect out and make an enum out of it.
Now you can check here that this test should fail if value doesn't exist in enum. And for enum class write a test to check for possible values. If there is new possible value being added your test should fail.
So your method will become:
bool IsSpecial(int value)
if (SpecialValues.has(value))
return true
else
return false
and your SpecialValues will be an enum like:
enum SpecialValues {
Three(3), Twenty(20)
public int value;
}
and now you should write to test possible values for enum. A simple test can be to check total number of possible values and another test can be to check the possible values itself
The other point to make is that in a less contrived example:
20 might have been some valid condition to test for based on knowledge of the business domain. Writing tests in a BDD style based on knowledge of the business problem might have helped you explicitly catch it.
4 might have been a good value to test for due to its status as a boundary condition. This may have been more likely to change in the real world so would more likely show up in a full test case.