In a lot of unit testing frameworks, test cases are independent. For example, GoogleTest says:
Tests should be independent and repeatable. It’s a pain to debug a test that succeeds or fails as a result of other tests. googletest isolates the tests by running each of them on a different object.
I don't understand why it is good that test cases are independent. For example, assume an composite object A which uses objects B and C. It is obvious that if B and C are buggy, what A does will be incorrect too, whether it is implemented correctly or not. So I somehow like to see an output like this:
Testing B [SUCCEED]
Testing C [FAILED]
Testing A [FAILED] because dependent test C failed.
Do these framework assume that rather than depending these to each other, it is better to test A by mocking B and C? Because sometimes writing correct mocker for your classes can be complex (and buggy itself), so I still think depending tests is better.
You're thinking of the wrong type of (in)dependence. The dependence meant here is that a testA2 depends on some form of initialization or setup done in testA1 (which might depend on initialization done by testB2, etc.). This makes your tests very brittle, because out of order execution, or failing tests will cascade through your entire set of test cases. It also makes it impossible to run the test case in isolation.
As an example, testA1 could create records in a test database, and testA2 expects those records to be present in the test database. If testA1 fails (or hasn't been run yet), then testA2 will fail too.
If instead you have independent test cases (e.g. create the records as part of the setup of testA2), then the failure of a test to run (or run correctly), or out of order execution of tests, will not prevent your other tests from completing successfully.
Of course, if there is something fundamentally broken in a dependency between your class under test (e.g. A) and another class (e.g. C), then you might have multiple test cases failing, but that is a different dependency problem than meant in the text you quote.
With independent test cases, you can still achieve the test result you desire (tests for A fail if C is broken) if you you don't mock/stub the dependency (e.g. C) of the (direct) class under test (e.g. A)).
To be clear, not everyone likes that type of dependence, and some will strive to make their unit tests independent from this type of problem as well by mocking, stubbing, etc. That is a more opinionated practice though, as it increases complexity, could make your tests more brittle, and makes it easier to miss problems with the interaction between those classes.
However, as I said earlier, that is a different type of independence than what is generally meant (and specifically in the text you quote) with independence of test cases.
Related
I've read over and over that it's important to create unit tests for all (or at least most) of your methods, and run them repeatedly throughout development. This made perfect sense to me at first, but now as I'm beginning to implement these tests myself I'm feeling less sure. From what I can see once you've made a test pass, it will always pass, since all the data it is using is mocked up. I feel like there's something I'm not getting.
Let's say you write a method like this:
/* Verifies email address (just for illustration, not robust code) */
bool VerifyEmail(String email){
return Regex.Match(email, "^\w+#\w+\.com$");
}
Maybe you would write a unit test like this:
/* Again, not robust, just for illustration */
void TestVerifyEmail(){
Dictionary<String, bool> testCases = new Dictionary<String, bool>(
{"fake#fake.com", true},
{"fake#!!!.com", false},
{"#fake.com", false},
{"fake#fake.cme", false}
);
forEach(String test in testCases.Keys()){
Test.Assert(VerifyEmail(test) == testCases[test]);
}
}
Unless you go and change the test cases, the results of the test function will never ever change, no matter what else happens to the rest of the code, because VerifyEmail() is isolated.
This is an especially simple case, but in most unit test examples I see, even ones that are meant not to operate in a vacuum, they always use totally mocked-up data, and so the test results will never change unless the test itself is changed.
What is the point of running unit tests over and over if, it seems to me, the results will never change? Since all the tests place the chunk of code they're testing into an isolated environment with mocked-up data, the unit tests will pass every time.
I totally get writing unit tests when initially creating the code to ensure it works the way you want, such as in TDD, but once you've done that, what's the point to ever running it again later?
Ideally, unit tests aren't written to verify that the code you wrote works; unit tests are written to ensure that requirements are met. If the unit test suite includes test coverage for every possible requirement (positive and negative) then once you write enough code to pass every test, your project is complete. The benefit to this is that if additional requirements are added later, someone can refactor the project to add in the additional lines and as long as every unit test passes, then the original requirements are still accomplished.
You have a valid point. Theoretically you could use a dependencies to only rerun tests that could have broken with new changes. In practice, though we don't. Why? Mostly because we don't trust that we could write the dependencies correctly. Remember that we are talking about functional dependencies, not just include-headers. I do not know a tool to auto generate those.
It may help to keep in mind that there are two common definitions of "unit test". Both share the constraints that tests should be fast, deterministic, isolated from each other.
There is a school that adds an additional constraint that the system under test should be isolated from all other collaborators in the system. But that definition isn't universal -- it's normal in the "Chicago Style" for the system under test to be a composite made from many different parts.
Martin Fowler:
As xunit testing became more popular in the 2000's the notion of solitary tests came back, at least for some people. We saw the rise of Mock Objects and frameworks to support mocking. Two schools of xunit testing developed, which I call the classic and mockist styles. One of the differences between the two styles is that mockists insist upon solitary unit tests, while classicists prefer sociable tests.
When private implementation details of the system under test can be refactored into separate parts, it becomes less cost effective to track which tests are dependent on which fragments of production code.
Furthermore, you are regularly going to be running some unit tests; at a minimum, after each refactoring you should be checking that you didn't introduce a regression, so you should be running all of the tests that depend on the code you just changed.
BUT, we're talking about tests that are fast and isolated from one another. Given that you are already taking a moment to run some tests, the marginal costs of running "more" tests are pretty small.
Of course, small isn't zero; I believe that in most cases developers use a weighting strategy to determine how often a tests needs to be run; run the tests that are likely to detect a problem more of than those that aren't likely to detect a problem, conditioned on what part of the code base you are actively working on.
It is a very often opinion that unit testing should not rely on tests order. I think it is not an exact rule but recommendation.
But in some cases it does not look good. For example I have CUtility class and CSoket, which uses CUtility methods. I want to run tests in a singe unit test execution. So it is logically to run CUtility test first and after that run CSoket tests.
But best practice of unit testing says: Do not rely on tests order.
Why?
Because unit tests are meant to test individual units of work in isolation. Cutility method calls in CSoket should be mocked in unit tests for CSoket. This way, the unit tests for CSoket become independent of CUtility tests passing or failing. Then, it no longer matters if CUtility or CSoket tests are run first.
To make tests independent of each other is certainly not a law, but a good practice. That means, following this rule will avoid a number of problems. This may, however, come at a price, namely some extra effort for the creation of the test suite. Whether this price is too high depends on your specific circumstances - you will have to judge it yourself. To be able to properly judge it, you should know what kind of problems you avoid by making tests independent of each other:
1) Certain modifications of your test suite can be done without trouble: You can add tests at any place, delete tests, re-order tests. If tests are dependent on each other, any such step can cause your test suite to fail unexpectedly.
2) You can improve tests individually without having to think about impact on other tests. For example, if you figure out that the goal of a certain test can be achieved with a simpler setup, you can make that change locally without having to check whether the changes you make would also impact other test cases.
3) Understanding the test suite is much simpler: Every test can be understood without looking at other tests. And, if a test fails it is much easier to find the cause for the failure because you do not also have to understand the sequence of test executions.
4) Independent tests succeed and fail individually, giving you a better feedback in case of test failures. In contrast, with dependent tests, you have a chain-effect: If one test in the chain happens to fail, it typically follows that the subsequent tests also fail just because of the dependency.
5) You can execute your tests selectively, for example to save time during test case execution. If tests are dependent on each other, you may either not have any choice rather than to execute them all, or at least the analysis which tests you need and which are not needed is more complex.
But, as said at the beginning, there are also situations where dependent tests are a better choice, taking everything into account. It may be rare in unit-testing, but on system testing level it happens often: Bringing a system (i.e. a device) into a certain state can be an extremely expensive operation, for example if the booting alone takes a significant time.
Sometime back I used mocks in TDD (Test Driven Development) when the implementation of dependent interfaces was not available. So I mocked those dependencies and tested my work.
Now implementations of the dependencies are available. So should I remove the mock or should I convert them to spy or similar? I think integration tests can only be done after removing those mocks, or I may have to duplicate them. I am little confused, any suggestion on this?
As far as unit tests are concerned - never. Usually it is because you want to keep your unit tests isolated:
Your unit tests (just as any other classes) should usually have single reason to change/break. When a bug happens you want to reduce possible number of offenders to absolute minimum. Having unit test failing because some dependency might have failed, or some other dependency might have failed, or maybe finally actual tested class might have failed is undersirable.
In reality, this is quite a strech because even changing an interface of a dependency will result in changes in unit tests of class using this dependency. What you want to avoid tho, is some guy from some other team changing implementation of some dependency you are using and as a result your tests break, apparently for no reason. This is again, highly undesirable.
Mocks should be used to isolate your units. Of course, when writing different types of tests (say integration) you don't want isolation. You want real components collaborating together. This is one of the reasons to use actual implementations rather than mocks - but! - this does not invalidate your unit tests. They should remain isolated.
As always: It depends.
If you are using a mock, you are testing against a defined interface, so your unit test is more focused and there could be less breaking changes in your tests (such as when something internal needs to be refactored).
If you are using the actual dependencies, then your unit tests are not as focused (they might be integration tests). So these tests are often slower, more difficult to set up and more prone to breaking when refactoring. On the other hand, since you are testing against the actual implementation, you are more likely to find bugs due to the actual implementation behaving differently than you expected.
This is generic to any language in which unit testing is done.
Most unit test libraries provide a way to control the order in which unit tests are run. Let's say I have a TestClass that defines twelve tests. Is there any good reason to try controlling the order the twelve tests run in? Keep in mind that any startup/shutdown code is already taken care of, because most libraries provide a way to do that to. The advantage I see to having an explicit test order is that you can compose your tests so each one uses only functionality that it tests directly or has already been tested by a prior test. The disadvantage is the maintenance cost of keeping the ordering up to date and ensuring that other developers understand why the order is what it is and work to preserve it.
Is this just not worth the effort?
It's not worth the effort. More importantly than that, it's not a good practice. Each unit test should run independently from the others. If one of your tests depends on another being run first, it's not a good test.
As far as only using functionality that has been tested by another test, you don't need to "order" the tests to achieve this. Let's say you have a piece of basic logic, and there is a test for that logic (Test A). You test a more complex piece of logic in a new Test B, and this new test assumes that the basic logic is working. If something later goes wrong with the basic logic, Test A will fail, and Test B may fail also. That is fine. Test A will pinpoint the problem for you to fix it. It doesn't matter what order the tests run in.
The advantage I see to having an explicit test order is that you can compose your tests so each one uses only functionality that it tests directly or has already been tested by a prior test.
I don't see much advantage here. Also, dependency frequently spreads across classes, so trying to sequence test runs by feature dependency would likely spread across multiple test classes, so ordering the run within a single test class wouldn't cover it.
Recently I had an interesting discussion with a colleague about unit tests. We were discussing when maintaining unit tests became less productive, when your contracts change.
Perhaps anyone can enlight me how to approach this problem. Let me elaborate:
So lets say there is a class which does some nifty calculations. The contract says that it should calculate a number, or it returns -1 when it fails for some reason.
I have contract tests who test that. And in all my other tests I stub this nifty calculator thingy.
So now I change the contract, whenever it cannot calculate it will throw a CannotCalculateException.
My contract tests will fail, and I will fix them accordingly. But, all my mocked/stubbed objects will still use the old contract rules. These tests will succeed, while they should not!
The question that rises, is that with this faith in unit testing, how much faith can be placed in such changes... The unit tests succeed, but bugs will occur when testing the application. The tests using this calculator will need to be fixed, which costs time and may even be stubbed/mocked a lot of times...
How do you think about this case? I never thought about it thourougly. In my opinion, these changes to unit tests would be acceptable. If I do not use unit tests, I would also see such bugs arise within test phase (by testers). Yet I am not confident enough to point out what will cost more time (or less).
Any thoughts?
The first issue you raise is the so-called "fragile test" problem. You make a change to your application, and hundreds of tests break because of that change. When this happens, you have a design problem. Your tests have been designed to be fragile. They have not been sufficiently decoupled from the production code. The solution is (as it it in all software problems like this) to find an abstraction that decouples the tests from the production code in such a way that the volatility of the production code is hidden from the tests.
Some simple things that cause this kind of fragility are:
Testing for strings that are displayed. Such strings are volatile because their grammar or spelling may change at the whim of an analyst.
Testing for discrete values (e.g. 3) that should be encoded behind an abstraction (e.g. FULL_TIME).
Calling the same API from many tests. You should wrap the API call in a test function so that when the API changes you can make the change in one place.
Test design is an important issue that is often neglected by TDD beginners. This often results in fragile tests, which then leads the novices to reject TDD as "unproductive".
The second issue you raised was false positives. You have used so many mocks that none of your tests actually test the integrated system. While testing independent units is a good thing, it is also important to test partial and whole integrations of the system. TDD is not just about unit tests.
Tests should be arranged as follows:
Unit tests provide close to 100% code coverage. They test independent units. They are written by programmers using the programming language of the system.
Component tests cover ~50% of the system. They are written by business analysts and QA. They are written in a language like FitNesse, Selenium, Cucumber, etc. They test whole components, not individual units. They test primarily happy path cases and some highly visible unhappy path cases.
Integration tests cover ~20% of the system. They tests small assemblies of components as opposed to the whole system. Also written in FitNesse/Selenium/Cucumber etc. Written by architects.
System tests cover ~10% of the system. They test the whole system integrated together. Again they are written in FitNesse/Selenium/Cucumber etc. Written by architects.
Exploratory manual tests. (See James Bach) These tests are manual but not scripted. They employ human ingenuity and creativity.
It's better to have to fix unit test that fail due to intentional code changes than not having tests to catch the bugs that are eventually introduced by these changes.
When your codebase has a good unit test coverage, you may run into many unit test failures that are not due to bugs in the code but intentional changes on the contracts or code refactoring.
However, that unit test coverage will also give you confidence to refactor the code and implement any contract changes. Some test will fail and will need to be fixed, but other tests will eventually fail due to bugs that you introduced with these changes.
Unit tests surely can not catch all bugs, even in the ideal case of 100% code / functionality coverage. I think that is not to be expected.
If the tested contract changes, I (the developer) should use my brains to update all code (including test code!) accordingly. If I fail to update some mocks which therefore still produce the old behaviour, that is my fault, not of the unit tests.
It is similar to the case when I fix a bug and produce a unit test for, but I fail to think through (and test) all similar cases, some of which later turns out to be buggy as well.
So yes, unit tests need maintenance just as well as the production code itself. Without maintenance, they decay and rot.
I have similar experiences with unit tests - when you change the contract of one class often you need to change loads of other tests as well (which will actually pass in many cases, which makes it even more difficult). That is why I always use higher level tests as well:
Acceptance tests - test a couple or more classes. These tests are usually aligned to user stores that need to be implemented - so you test that the user story "works". These don't need to connect to a DB or other external systems, but may.
Integration tests - mainly to check external system connectivity, etc.
Full end-to-end tests - test the whole system
Please note that even if you have 100% unit test coverage, you are not even guaranteed that your application starts! That is why you need higher level tests. There are so many different layers of tests because the lower you test something, the cheaper it usually is (in terms of development, maintaining test infrastructure as well as execution time).
As a side note - because of the problem you mentioned using unit tests teaches you to keep your components as decoupled as possible and their contracts as small as possible - which is definitely a good practise!
One of the rules for unit tests code (and all other code used for testing) is to treat it the same way as production code - no more, no less - just the same.
My understanding of this is that (beside keeping it relevant, refactored, working etc. like production code) it should be looked at it the same way from the investment/cost prospective as well.
Probably your testing strategy should include something to address the problem you have described in the initial post - something along the lines specifying what test code (including stubs/mocks) should be reviewed (executed, inspected, modified, fixed etc) when a designer change a function/method in production code. Therefore the cost of any production code change must include the cost of doing this - if not - the test code will become "third-class citizen" and the designers' confidence in the unit test suite as well as its relevance will decrease... Obviously, the ROI is in the timing of bugs discovery and fix.
One principle that I rely on here is removing duplication. I generally don't have many different fakes or mocks implementing this contract (I use more fakes than mocks partly for this reason). When I change the contract it is natural to inspect every implementation of that contract, production code or test. It bugs me when I find I'm making this kind of change, my abstractions should have been better thought out perhaps etc, but if the test codes is too onerous to change for the scale of the contract change then I have to ask myself if these also are due some refactoring.
I look at it this way, when your contract changes, you should treat it like a new contract. Therefore, you should create a whole new set of UNIT test for this "new" contract. The fact that you have an existing set of test cases is besides the point.
I second uncle Bob's opinion that the problem is in the design. I would additionally go back one step and check the design of your contracts.
In short
instead of saying "return -1 for x==0" or "throw CannotCalculateException for x==y", underspecify niftyCalcuatorThingy(x,y) with the precondition x!=y && x!=0 in appropriate situations (see below). Thus your stubs may behave arbitrarily for these cases, your unit tests must reflect that, and you have maximal modularity, i.e. the liberty to arbitrarily change the behavior of your system under test for all underspecified cases - without the need to change contracts or tests.
Underspecification where appropriate
You can differentiate your statement "-1 when it fails for some reason" according to the following criteria: Is the scenario
an exceptional behavior that the implementation can check?
within the method's domain/responsibility?
an exception that the caller (or someone earlier in the call stack) can recover from/handle in some other way?
If and only if 1) to 3) hold, specify the scenario in the contract (e.g. that EmptyStackException is thrown when calling pop() on an empty stack).
Without 1), the implementation cannot guarantee a specific behavior in the exceptional case. For instance, Object.equals() does not specify any behavior when the condition of reflexivity, symmetry, transitivity & consistency is not met.
Without 2), SingleResponsibilityPrinciple is not met, modularity is broken and users/readers of the code get confused. For instance, Graph transform(Graph original) should not specify that MissingResourceException might be thrown because deep down, some cloning via serialization is done.
Without 3), the caller cannot make use of the specified behavior (certain return value/exception). For instance, if the JVM throws an UnknownError.
Pros and Cons
If you do specify cases where 1), 2) or 3) does not hold, you get some difficulties:
a main purpose of a (design by) contract is modularity. This is best achievable if you really separate the responsibilities: When the precondition (the responsibility of the caller) is not met, not specifying the behavior of the implementation leads to maximal modularity - as your example shows.
you don't have any liberty to change in the future, not even to a more general functionality of the method which throws exception in fewer cases
exceptional behaviors can become quite complex, so the contracts covering them become complex, error prone and hard to understand. For instance: is every situation covered? Which behavior is correct if multiple exceptional preconditions hold?
The downside of underspecification is that (testing) robustness, i.e. the implementation's ability to react appropriately to abnormal conditions, is harder.
As compromise, I like to use the following contract schema where possible:
<(Semi-)formal PRE- and POST-condition, including exceptional
behavior where 1) to 3) hold>
If PRE is not met, the current implementation throws the RTE A, B or
C.