Let's say that I would like to change my NUnit parametrized test method to a theory. As far as theories go they should define all assumptions/preconditions under which assertions will pass. As per NUnit documentation:
[when comparing theory to parametrized test] A theory, on the other hand, makes a general statement that all of its assertions will pass for all arguments satisfying certain assumptions.
But as I understand it this means that called PUT's code should be basically translated to assumptions. Completely.
What's the point having theories then? Because our algorithm would be written twice. First as testable code and second as theory assumptions. So if we'd intro a bug in the algorithm both our code and test would likely have the same bug. What's the point then?
Example for better understanding
Let's say we're having a checksum method that only supports digits and we'd like to test it using a theory. Let's write a theory:
static Regex rx = new Regex(#"^\d+$", RegexOptions.Compiled);
[Theory]
public void ChecksumTheory(string value)
{
Assume.That(!string.IsNotNullOrWhiteSpace(value));
Assume.That(value.Length > 1); // one single number + checksum = same number twice
Assume.That(rx.IsMatch(value));
var cc = new ChecksumValidator();
bool result = cc.ValidateValue(value);
Assert.IsTrue(result); // not really as algorithm assumptions are missing
}
This is a pretty nice theory, except that without actually implementing the tested code algorithm and expressing it as a set of assumptions its assertions still won't pass because without explicit algorithm assumptions we can't know what the outcome of the validation will be.
Additional info
Theories seem rather trivial and concise when we only need to provide assumptions on input state namely checking that particular values are being set correctly or that their combination is relevant:
[Theory]
public void Person_ValidateState(Person input)
{
Assume.That(input.Age < 110);
Assume.That(input.Birth < input.Death || input.Death == null);
...
}
Questions
Why write unit test theories if one needs to provide enough assumptions for all asserts to pass?
If we don't want to reinvent the wheel by providing all algorithm assumptions, how do we provide correct assumptions?
If that's not the case, how should I rewrite my theory to make it a good example of NUnit theories?
What is the intended use (by their creators) of test theories anyway?
Theories vs. parameterized tests
I am also aiming at introducing assumptions in my tests instead of using parameterized tests. But still I haven't started it due to similar thoughts.
The goal of assumptions is to describe the given input as a subset from an uncountable - or say vast but complete - set of values by applying a filter. By this your code above is absolutely correct, nevertheless in this case you would have to write several similar tests for negative result testing - eg. when the outcome of cc.ValidateValue(...) is false. Once again - for comprehensibility - I would still rely on a good choice of hand-picked parameters for a parameterized test of this trivial function.
On the other hand assumptions may be useful for tests of more complex business logic. Imagine you have a garage full of fancy cars and you feel like smashing the gas on some remote terrain - also let's imagine this is a business requirement so you need to write tests for it (how cool would this be!). Then you could write a test like this:
[Theory]
public void CarCanDriveOnMuddyGround(Car car)
{
Assume.That(car.IsFourWheelDrive);
Assume.That(car.HasMotor);
Assume.That(car.MaxSpeed > 50);
Assume.That(car.Color != "white");
bool result = car.DriveWithGivenSpeedOn<MuddyGround>(50);
Assert.IsTrue(result);
}
See how this is strongly related to the BDD approach? Like you I am also not that much convinced about using assumptions for plain unit tests. But I am certain that it's a good idea to use different approaches for test functions (parameterized, assertions) according to the different test levels (unit, integration, system, user acceptance).
About algorithm details in assumptions
Thought about your specific problem again. Now I've got your point. In my words: You would need to assume that a given value will give a positive result before you can assert that it gives a positive result. Right? I think you found a pretty good example why theories do not always work.
I tried to solve it anyway in a slightly simpler example (for readability). But I admit it's not very convincing:
public class TheoryTests
{
[Datapoints]
public string[] InvalidValues = new[] { null, string.Empty };
[Datapoints]
public string[] PositiveValues = new[] { "good" };
[Datapoints]
public string[] NegativeValues = new[] { "Bad" };
private bool FunctionUnderTest(string value)
{
return value.ToLower().Equals(value);
}
[Theory]
public void PositiveTest(string value)
{
Assume.That(!string.IsNullOrEmpty(value));
var result = FunctionUnderTest(value);
Assert.True(result);
}
[Theory]
public void PassingPositiveTest(string value)
{
Assume.That(!string.IsNullOrEmpty(value));
Assume.That(!NegativeValues.Contains(value));
var result = FunctionUnderTest(value);
Assert.True(result);
}
}
PositiveTest will fail obviously because the algorithm assumption is missing.
See the second line in the body of PassingPositiveTest which prevents the test from failing. The downside is of course that this actually is an example-based test and not a pure theory-based test. Better ideas welcome.
Related
I'm trying to replace some old unit tests with property based testing (PBT), concreteley with scala and scalatest - scalacheck but I think the problem is more general. The simplified situation is , if I have a method I want to test:
def upcaseReverse(s:String) = s.toUpperCase.reverse
Normally, I would have written unit tests like:
assertEquals("GNIRTS", upcaseReverse("string"))
assertEquals("", upcaseReverse(""))
// ... corner cases I could think of
So, for each test, I write the output I expect, no problem. Now, with PBT, it'd be like :
property("strings are reversed and upper-cased") {
forAll { (s: String) =>
assert ( upcaseReverse(s) == ???) //this is the problem right here!
}
}
As I try to write a test that will be true for all String inputs, I find my self having to write the logic of the method again in the tests. In this case the test would look like :
assert ( upcaseReverse(s) == s.toUpperCase.reverse)
That is, I had to write the implementation in the test to make sure the output is correct.
Is there a way out of this? Am I misunderstanding PBT, and should I be testing other properties instead, like :
"strings should have the same length as the original"
"strings should contain all the characters of the original"
"strings should not contain lower case characters"
...
That is also plausible but sounds like much contrived and less clear. Can anybody with more experience in PBT shed some light here?
EDIT : following #Eric's sources I got to this post, and there's exactly an example of what I mean (at Applying the categories one more time): to test the method times in (F#):
type Dollar(amount:int) =
member val Amount = amount
member this.Add add =
Dollar (amount + add)
member this.Times multiplier =
Dollar (amount * multiplier)
static member Create amount =
Dollar amount
the author ends up writing a test that goes like:
let ``create then times should be same as times then create`` start multiplier =
let d0 = Dollar.Create start
let d1 = d0.Times(multiplier)
let d2 = Dollar.Create (start * multiplier) // This ones duplicates the code of Times!
d1 = d2
So, in order to test that a method, the code of the method is duplicated in the test. In this case something as trivial as multiplying, but I think it extrapolates to more complex cases.
This presentation gives some clues about the kind of properties you can write for your code without duplicating it.
In general it is useful to think about what happens when you compose the method you want to test with other methods on that class:
size
++
reverse
toUpperCase
contains
For example:
upcaseReverse(y) ++ upcaseReverse(x) == upcaseReverse(x ++ y)
Then think about what would break if the implementation was broken. Would the property fail if:
size was not preserved?
not all characters were uppercased?
the string was not properly reversed?
1. is actually implied by 3. and I think that the property above would break for 3. However it would not break for 2 (if there was no uppercasing at all for example). Can we enhance it? What about:
upcaseReverse(y) ++ x.reverse.toUpper == upcaseReverse(x ++ y)
I think this one is ok but don't believe me and run the tests!
Anyway I hope you get the idea:
compose with other methods
see if there are equalities which seem to hold (things like "round-tripping" or "idempotency" or "model-checking" in the presentation)
check if your property will break when the code is wrong
Note that 1. and 2. are implemented by a library named QuickSpec and 3. is "mutation testing".
Addendum
About your Edit: the Times operation is just a wrapper around * so there's not much to test. However in a more complex case you might want to check that the operation:
has a unit element
is associative
is commutative
is distributive with the addition
If any of these properties fails, this would be a big surprise. If you encode those properties as generic properties for any binary relation T x T -> T you should be able to reuse them very easily in all sorts of contexts (see the Scalaz Monoid "laws").
Coming back to your upperCaseReverse example I would actually write 2 separate properties:
"upperCaseReverse must uppercase the string" >> forAll { s: String =>
upperCaseReverse(s).forall(_.isUpper)
}
"upperCaseReverse reverses the string regardless of case" >> forAll { s: String =>
upperCaseReverse(s).toLowerCase === s.reverse.toLowerCase
}
This doesn't duplicate the code and states 2 different things which can break if your code is wrong.
In conclusion, I had the same question as you before and felt pretty frustrated about it but after a while I found more and more cases where I was not duplicating my code in properties, especially when I starting thinking about
combining the tested function with other functions (.isUpper in the first property)
comparing the tested function with a simpler "model" of computation ("reverse regardless of case" in the second property)
I have called this problem "convergent testing" but I can't figure out why or where there term comes from so take it with a grain of salt.
For any test you run the risk of the complexity of the test code approaching the complexity of the code under test.
In your case, the the code winds up being basically the same which is just writing the same code twice. Sometimes there is value in that. For example, if you are writing code to keep someone in intensive care alive, you could write it twice to be safe. I wouldn't fault you for the abundance of caution.
For other cases there comes a point where the likelihood of the test breaking invalidates the benefit of the test catching real issues. For that reason, even if it is against best practice in other ways (enumerating things that should be calculated, not writing DRY code) I try to write test code that is in some way simpler than the production code, so it is less likely to fail.
If I cannot find a way to write code simpler than the test code, that is also maintainable(read: "that I also like"), I move that test to a "higher" level(for example unit test -> functional test)
I just started playing with property based testing but from what I can tell it is hard to make it work with many unit tests. For complex units, it can work, but I find it more helpful at functional testing so far.
For functional testing you can often write the rule a function has to satisfy much more simply than you can write a function that satisfies the rule. This feels to me a lot like the P vs NP problem. Where you can write a program to VALIDATE a solution in linear time, but all known programs to FIND a solution take much longer. That seems like a wonderful case for property testing.
I've read that generating random data in unit tests is generally a bad idea (and I do understand why), but testing on random data and then constructing a fixed unit test case from random tests which uncovered bugs seems nice. However I don't understand how to organize it nicely. My question is not related to a specific programming language or to a specific unit test framework actually, so I'll use python and some pseudo unit test framework. Here's how I see coding it:
def random_test_cases():
datasets = [
dataset1,
dataset2,
...
datasetn
]
for dataset in datasets:
assertTrue(...)
assertEquals(...)
assertRaises(...)
# and so on
The problem is: when this test case fails I can't figure out which dataset caused failure. I see two ways of solving it:
Create a single test case per dataset — the problem is load of test cases and code duplication.
Usually test framework lets us pass a message to assert functions (in my example I could do something like assertTrue(..., message = str(dataset))). The problem is that I should pass such a message to each assert, which does not look like elegant too.
Is there a simpler way of doing it?
I still think it's a bad idea.
Unit tests need to be straightforward. Given the same piece of code and the same unit test, you should be able to run it infinitely and never get a different response unless there's an external factor coming in to play. A goal contrary to this will increase maintenance cost of your automation, which defeats the purpose.
Outside of the maintenance aspect, to me it seems lazy. If you put thought in to your functionality and understand the positive as well as the negative test cases, developing unit tests are straightforward.
I also disagree with the user who shows how to do multiple tests cases inside of the same test case. When a test fails, you should be able to tell immediately which test failed and know why it failed. Tests should be as simple as you can make them and as concise/relevant to the code under test as possible.
You could define tests by extension instead of enumeration, or you could call multiple test cases from a single case.
calling multiple test cases from a single test case:
MyTest()
{
MyTest(1, "A")
MyTest(1, "B")
MyTest(2, "A")
MyTest(2, "B")
MyTest(3, "A")
MyTest(3, "B")
}
And there are sometimes elegant ways to achieve this with some testing frameworks. Here is how to do it in NUnit:
[Test, Combinatorial]
public void MyTest(
[Values(1,2,3)] int x,
[Values("A","B")] string s)
{
...
}
I also think it's a bad idea.
Mind you, not throwing random data at your code, but having unit tests doing that. It all boils down to why you unit test in the first place. The answer is "to drive the design of the code". Random data doesn't drive the design of the code, because it depends on a very rigid public interface. Mind you, you can find bugs with it, but that's not what unit tests are about. And let me note that I'm talking about unit tests, and not tests in general.
That being said, I strongly suggest taking a look at QuickCheck. It's Haskell, so it's a bit dodgy on presentation and a bit PhD-ish on documentation, but you should be able to figure it out. I'm going to summarize how it works, though.
After you pick the code you want to test (let's say the sort() function), you establish invariants which should hold. In this examples, you can have the following invariants if result = sort(input):.
Every element in result should be smaller than or equal to the next one.
Every element in input should be present in result the same number of times.
result and input should have the same length (this is repeats the previous, but let's have it for illustration).
You encode each variant in a simple function that takes the result and the output and checks whether those invariants code.
Then, you tell QuickCheck how to generate input. Since this is Haskell and the type system kicks ass, it can see that the function takes a list of integers and it knows how to generate those. It basically generates random lists of random integers and random length. Of course, it can be more fine-grained if you have a more complex data type (for example, only positive integers, only squares, etc.).
Finally, when you have those two, you just run QuickCheck. It generates all that stuff randomly and checks the invariants. If some fail, it will show you exactly which ones. It would also tell you the random seed, so you can rerun this exact failure if you need to. And as an extra bonus, whenever it gets a failed invariant, it will try to reduce the input to the smallest possible subset that fails the invariant (if you think of a tree structure, it will reduce it to the smallest subtree that fails the invariant).
And there you have it. In my opinion, this is how you should go about testing stuff with random data. It's definitely not unit tests and I even think you should run it differently (say, have CI run it every now and then, as opposed to running it on every change (since it will quickly get slow)). And let me repeat, it's a different benefit from unit testing - QuickCheck finds bugs, while unit testing drives design.
Usually the unit test frameworks support 'informative failures' as long as you pick the right assertion method.
However if everything else doesn't work, You could easily trace the dataset to the console/output file. Low tech but should work.
[TestCaseSource("GetDatasets")]
public Test.. (Dataset d)
{
Console.WriteLine(PrettyPrintDataset(d));
// proceed with checks
Console.WriteLine("Worked!");
}
In quickcheck for R we tried to solve this problem as follows
the tests are actually pseudo-random (the seed is fixed) so you can always reproduce your tests results (barring external factors, of course)
the test function returns enough data to reproduce the error, including the assertion that failed and the data that made it fail. A convenience function, repro, called on the return value of test will land you in the debugger at the beginning of the failing assertion, with arguments set to the witnesses of the failure. If the tests are executed in batch mode, equivalent information is stored in a file and the command to retrieve it is printed in stderr. Then you can call repro as before. Whether or not you program in R, I would love to know if this starts to address you requirements. Some aspects of this solution may be hard to implement in languages that are less dynamic or don't have first class functions.
I have this method, which is called from a public setter:
private void EmitEvents(IGridCell[] oldGridCells)
{
bool wasAllInOneGridCell = _isAllInOneGridCell;
_isAllInOneGridCell = GridCells[0] == GridCells[1] &&
GridCells[2] == GridCells[3] &&
GridCells[0] == GridCells[2];
// All points were in the same grid cell before, and still are
if (_isAllInOneGridCell && wasAllInOneGridCell && oldGridCells[0] == GridCells[0])
{
return;
}
for (int i = 0; i < 4; i++)
{
GridCellExited(this, new GridCellChangeEventArgs(oldGridCells[i]));
}
for (int i = 0; i < 4; i++)
{
GridCellEntered(this, new GridCellChangeEventArgs(GridCells[i]));
}
}
What this code exactly does isn't too important. I already have a test for the 2 events. What I'm wondering is what approach should generally be taken when testing semi-complex conditionals such as the if statement in this method.
Since there are 3 boolean comparisons, there are a total of 8 different combinations. Surely I don't want to write 8 unit tests to cover all of those possibilities. But what do I do then? I'd want to check the positive condition and 1 negative condition at least, but isn't part of the job of unit testing creating a sort of "spec" that the class must adhere to?
Let's say the negative case I choose in my unit test is the one where _isAllInOneGridCell would equate to false. Then let's say someone is doing some refactoring and accidentally removes the && wasAllInOneGridCell condition from the if statement. They have introduced a bug, but since I didn't write a test to cover this exact mundane case, it will not cause a test to fail.
To summarize, on one hand, I view unit testing as a way of defining a contract that the code must follow in order to protect against regressions. Unfortunately, to uphold this I'd have to do insane things like right 8 unit tests for a one line if statement. What is the best route here?
If there are 8 possible input-output scenarios (determined by your conditions) then there's not much you can do to go around that; you either test them all or don't. That's one of the things unit testing is about, like you mentioned - to assure contract and protect against regression.
However, if you find it hard to test your class, it might be a sign some refactoring could help. Moving your complex logic to a separate class (method) springs to mind first. You'll still probably need multiple tests for logic, but doing so decouples it from your EmitEvents method, which will end up with tests for actual events verification under two conditions only (logic passed - raise; logic failed - don't).
It's always a judgement call in those situations. Is your logic complex enough to make it separate being, or is it still part of class it's in? One way or another, if you want to test your contract fully, I'm afraid there's no short ways to do that.
I think it's fair to lump many test cases into one test if that can make it easier.
As long as you don't add a lot of complexity to the test, and introduce risk of having bugs in the test itself :-)
In some cases it makes sense to generate input combinations in loops etc to cover all combinations, instead of hand coding everything...
I agree, it's always a judgment call though to decide what cases are needed.
Why is it that every unit testing framework (that I know of) requires the expected value in equality tests to always be the first argument:
Assert.AreEqual(42, Util.GetAnswerToLifeTheUniverseAndEverything());
assertEquals(42, Util.GetAnswerToLifeTheUniverseAndEverything());
etc.
I'm quite used to it now, but every coder I try to teach unit testing makes the mistake of reversing the arguments, which I understand perfectly. Google didn't help, maybe one of the hard-core unit-testers here knows the answer?
It seems that most early frameworks used expected before actual (for some unknown reason though, dice roll perhaps?). Yet with programming languages development, and increased fluency of the code, that order got reversed. Most fluent interfaces usually try to mimic natural language and unit testing frameworks are no different.
In the assertion, we want to assure that some object matches some conditions. This is the natural language form, as if you were to explain your test code you'd probably say
"In this test, I make sure that computed value is equal to 5"
instead of
"In this test, I make sure that 5 is equal to computed value".
Difference may not be huge, but let's push it further. Consider this:
Assert.That(Roses, Are(Red));
Sounds about right. Now:
Assert.That(Red, Are(Roses));
Hm..? You probably wouldn't be too surprised if somebody told you that roses are red. Other way around, red are roses, raises suspicious questions. Yoda, anybody?
Yoda's making an important point - reversed order forces you to think.
It gets even more unnatural when your assertions are more complex:
Assert.That(Forest, Has.MoreThan(15, Trees));
How would you reverse that one? More than 15 trees are being had by forest?
This claim (fluency as a driving factor for modification) is somehow reflected in the change that NUnit has gone through - originally (Assert.AreEqual) it used expected before actual (old style). Fluent extensions (or to use NUnit's terminology, constraint based - Assert.That) reversed that order.
I think it is just a convention now and as you said it is adopted by "every unit testing framework (I know of)". If you are using a framework it would be annoying to switch to another framework that uses the opposite convention. So (if you are writing a new unit testing framework for example) it would be preferable for you as well to follow the existing convention.
I believe this comes from the way some developers prefer to write their equality tests:
if (4 == myVar)
To avoid any unwanted assignment, by mistake, writing one "=" instead of "==". In this case the compiler will catch this error and you will avoid a lot of troubles trying to fix a weird runtime bug.
Nobody knows and it is the source of never ending confusions. However not all frameworks follow this pattern (to a greater confusion):
FEST-Assert uses normal order:
assertThat(Util.GetAnswerToLifeTheUniverseAndEverything()).isEqualTo(42);
Hamcrest:
assertThat(Util.GetAnswerToLifeTheUniverseAndEverything(), equalTo(42))
ScalaTest doesn't really make a distinction:
Util.GetAnswerToLifeTheUniverseAndEverything() should equal (42)
I don't know but I've been part of several animated discussions about the order of arguments to equality tests in general.
There are a lot of people who think
if (42 == answer) {
doSomething();
}
is preferable to
if (answer == 42) {
doSomething();
}
in C-based languages. The reason for this is that if you accidentally put a single equals sign:
if (42 = answer) {
doSomething();
}
will give you a compiler error, but
if (answer = 42) {
doSomething();
}
might not, and would definitely introduce a bug that might be hard to track down. So who knows, maybe the person/people who set up the unit testing framework were used to thinking of equality tests in this way -- or they were copying other unit testing frameworks that were already set up this way.
I think it's because JUnit was the precursor of most unit testing frameworks (not that it was the first unit testing framework, but it kicked off an explosion in unit testing). Since JUnit did it that way, all the subsequent frameworks copied this form and it became a convention.
why did JUnit do it that way? I don't know, ask Kent Beck!
My view for this would be to avoid any exceptions eg: 42.equals(null) vs null.equals(42)
where 42 is expected
null is actual
Well they had to pick one convention. If you want to reverse it try the Hamcrest matchers. They are meant to help increase readability. Here is a basic sample:
import org.junit.Test;
import static org.junit.Assert.assertThat;
import static org.hamcrest.core.Is.is;
public HamcrestTest{
#Test
public void matcherShouldWork(){
assertThat( Math.pow( 2, 3 ), is( 8 ) );
}
}
Surely it makes logical sense to put the expected value first, as it's the first known value.
Think about it in the context of manual tests. A manual test will have the expected value written in, with the actual value recorded afterwards.
Consider the following code (from a requirement that says that 3 is special for some reason):
bool IsSpecial(int value)
if (value == 3)
return true
else
return false
I would unit test this with a couple of functions - one called TEST(3IsSpecial) that asserts that when passed 3 the function returns true and another that passes some random value other than 3 and asserts that the function returns false.
When the requirement changes and say it now becomes 3 and 20 are special, I would write another test that verifies that when called with 20 this function returns true as well. That test would fail and I would then go and update the if condition in the function.
Now, what if there are people on my team who do not believe in unit testing and they make this change. They will directly go and change the code and since my second unit test might not test for 20 (it could be randomly picking an int or have some other int hardcoded). Now my tests aren't in sync with the code. How do I ensure that when they change the code some unit test or the other fails?
I could be doing something grossly wrong here so any other techniques to get around this are also welcome.
That's a good question. As you note a Not3IsNotSpecial test picking a random non-3 value would be the traditional approach. This wouldn't catch a change in the definition of "special".
In a .NET environment you can use the new code contracts capability to write the test predicate (the postcondition) directly in the method. The static analyzer would catch the defect you proposed. For example:
Contract.Ensures(value != 3 && Contract.Result<Boolean>() == false);
I think anybody that's a TDD fan is experimenting with contracts now to see use patterns. The idea that you have tools to prove correctness is very powerful. You can even specify these predicates for an interface.
The only testing approach I've seen that would address this is Model Based Testing. The idea is similar to the contracts approach. You set up the Not3IsNotSpecial condition abstractly (e.g., IsSpecial(x => x != 3) == false)) and let a model execution environment generate concrete tests. I'm not sure but I think these environments do static analysis as well. Anyway, you let the model execution environment run continuously against your SUT. I've never used such an environment, but the concept is interesting.
Unfortunately, that specific scenario is something that is difficult to guard against. With a function like IsSpecial, it's unrealistic to test all four billion negative test cases, so, no, you're not doing something grossly wrong.
Here's what comes to me off the top of my head. Many repositories have hooks that allow you to run some process on each check-in, such as running the unit tests. It's possible to set a criterion that newly checked in code must reach some threshold of code coverage under unit tests. If the commit does not meet certain metrics, it is rejected.
I've never had to set one of these systems up, so I don't know what is involved, but I do know it's possible.
And believe me, I feel your pain. I work with people who are similarly resistant to unit testing.
One thing you need to think about is why 3 is a special character and others are not. If it is defining some aspect of your application, you can take that aspect out and make an enum out of it.
Now you can check here that this test should fail if value doesn't exist in enum. And for enum class write a test to check for possible values. If there is new possible value being added your test should fail.
So your method will become:
bool IsSpecial(int value)
if (SpecialValues.has(value))
return true
else
return false
and your SpecialValues will be an enum like:
enum SpecialValues {
Three(3), Twenty(20)
public int value;
}
and now you should write to test possible values for enum. A simple test can be to check total number of possible values and another test can be to check the possible values itself
The other point to make is that in a less contrived example:
20 might have been some valid condition to test for based on knowledge of the business domain. Writing tests in a BDD style based on knowledge of the business problem might have helped you explicitly catch it.
4 might have been a good value to test for due to its status as a boundary condition. This may have been more likely to change in the real world so would more likely show up in a full test case.