I heard that all-paths coverage test is more strong than other forms of coverage tests.
Can someone give an example of a piece of code, that has an error, that can only be detected by all-paths coverage test? E.g. the code, for which a set of tests exist, that covers the whole function, and they show correctness of the implementation, but there still exist a valid combination of input arguments values, that proves the function is not implemented 100% correctly.
Related
Is it a code smell to use a generated file, like a spreadsheet or XML file, as a source for comparison in Unit tests?
Say that we had to write a lot of classes that produce various XML files for processing later. The unit tests would be a large number of repetitive assertions that foo.getExpectedValue() = expectedValue. Instead of doing this, the developer chooses to use the code they are supposed to be testing to generate the XML, then copy that into test/resources for all future tests load into memory as an object and to run their assertions against. Is this a code smell?
There are two practices in what you describe that classify as test smells.
First, you write that the classes that are to be tested are used to create the XML files that are later used to judge the correctness. This way you can find out if there are changes in the classes, but you can not figure out if the results were correct in the first place.
To avoid any misunderstandings: The smell is not that generated files are used, but that the files are generated with the code under test. The only way how such an approach might make sense would be if the results of the initial run were subject to thorough review. But, these reviews would have to be repeated again whenever the files are re-generated later.
Secondly, using complete XML files for comparison (generated or not) is another test small. The reason is, that these tests are not very focused. Any change to the XML generation will lead to a failing test. That may seem like a good thing, but it even applies to all kinds of intended changes, for example to changes in the indentation. Thus, you only have test that tell you "something changed", but not "something failed".
To have tests that tell you "something failed" you need more specific tests. For example, tests that only look at a certain portion of the generated XML. Some tests would look at the XML structure, others at the data content. You can even have tests to check the indentation. But, how? You could, for example, use regular expressions to see if some interesting portion of a generated XML string looks the way you expected.
Once you have more focused tests, then the rest results in case of intended modifications to your code will look different: When your changes are successfull, only a few test cases of many will fail, and this will be the ones that have tested against the part of the behaviour that you intentionally have changed. All other tests will still work OK and show you that your change did not break something unexpectedly. If in contrast your change was incorrect, then some more/other tests than the expected ones will show you that the change had unexpected effects.
Yes, I wouldn't do that. This is breaking principles of a good test. Mainly, a good test test should be;
Independent - should not rely on other tests. If subsequent tests have to wait for a file generated by the first test,tests are not independent.
Repeatable - these tests introduce the flakiness due to the file/in-memory dependency. Therefore, they may not be repeatable consistently.
Perhaps, you could take a step back and see whether you need to unit test every generated XML. If these file generation follow the same code execution path with (no logical difference), I wouldn't write unit test per each case. If these XML generation is business operation related, I would consider having acceptance tests.
If the files are small, you can use string reader and writer. In general one is supposed to avoid doing i/o in unit tests. However,I see nothing wrong with using files in some non-unit tests.
I want to test (proof) whether my Unit tests actually tests everything it needs to. Specifically how do I check whether I didn't miss certain asserts?
Take for instance this code:
int AddPositives(int a, int b)
{
if (a > 0 && b > 0)
return a + b;
return -1;
}
And someone wrote a Unit test like so:
[Test]
public void TestAddPositives()
{
Assert.AreEqual(3, AddPositives(1, 2));
AddPositives(0, 1);
}
Clearly an assert was missed here, which you might catch in a code-review. But how would you catch this automatically?
So is there something which breaks tested code on purpose to detect missing Asserts? Something which inspects the bytecode and changes constants and deletes code to check whether things can be changed without the Unit test failing.
There are several approaches that can help avoid the problem you have described:
1) The approach you mention (to 'break' the code) is known as mutation testing: Create 'mutants' of the system under test and see how many of the mutants are detected by the test suite. A mutant is a modification of the SUT, for example, by replacing operators in the code: One + in the code could be replaced by a - or a *. But, there are many more possibilities to create mutants. The English Wikipedia has an article about mutation testing. There you also find a number of references, some of which list tools to support mutation testing.
Mutation testing may help you detect 'inactive' test cases, but only if you have some reference that indicates, which mutations should have been detected.
2) Test-first approaches / test-driven development (TDD) also helps to avoid the problem you have described: In a test-first scenario, you write the test before you write the code that makes the test succeed. Therefore, after writing the test, the test suite should fail because of the new test.
Your scenario, namely that you forget to add an assertion, would be detected, because after adding your (not yet complete) test your test suite would not fail, but rather continue to suceed.
However, after the code is implemented, usually additional tests are implemented, for example to also address boundary cases. In these cases, the code is already there and you would then have to temporarily 'break' it to also see the additional tests fail.
3) As was already pointed out by others, coverage analysis can help you to detect the lack of tests that cover a specific part of the code. There are different types of coverage, like statement coverage, branch coverage, etc. But, with a good quality test suite, a piece of code is often covered many times to address boundary cases and other scenarios of interest. Then, leaving out one test case may still not be detected.
Summarized, while all these approaches can help you somehow, none of them is bullet proof. Neither is a review, because also reviewers miss some points. A review may, however, bring additional benefits, like, suggestions to improve the set of tests or the test code.
Some code coverage tools such as NCrunch (excellent but not free) will annotate your code lines to show whether a test hits them.
In the example you gave NCrunch would show a small black dot next to the "return -1;" line. This would indicate that no existing test passes through that line of code and therefore it is untested.
This is not perfect however since you could still write a test that hit that line of code without asserting that it returned -1, so you can't assume that just because you have 100% coverage that you have written all the meaningful tests. So it can tell you that return -1 is definitely not unit-tested but it would not tell you that you had failed to test a boundary condition (such as checking what happens when a = 0)
Please note: I'm not asking for your opinion. I'm asking about conventions.
I was just wondering whether I should have both passing and failing tests with appropriate method names such as, Should_Fail_When_UsageQuantityIsNegative() , Should_Fail_When_UsageQuantityMoreThan50() , Should_Pass_When_UsageQuantityIs50().
Or instead, should I code them to pass and keep all the tests in Passed condition?
When you create unit tests, they should all pass. That doesn't mean that you shouldn't test the "failing" cases. It just means that the test should pass when it "fails."
This way, you don't have to go through your (preferably) large number of tests and manually check that the correct ones passed and failed. This pretty much defeats the purpose of automation.
As Mark Rotteveel points out in the comments, just testing that something failed isn't always enough. Make sure that the failure is the correct failure. For example, if you are using error codes and error_code being equal to 0 indicates a success and you want to make sure that there is a failure, don't test that error_code != 0; instead, test for example that error_code == 19 or whatever the correct failing error code is.
Edit
There is one additional point that I would like to add. While the final version of your code that you deploy should not have failing tests, the best way to make sure that you are writing correct code is to write your tests before you write the rest of the code. Before making any change to your source code, write a unit test (or ideally, a few unit tests) that should fail (or fail to compile) now, but pass after your change has been made. That's a good way to make sure that the tests that you write are testing the correct thing. So, to summarize, your final product should not have failing unit tests; however, the software development process should include periods where you have written unit tests that do not yet pass.
You should not have failing tests unless your program is acting in a way that it is not meant to.
If the intended behavior of your program is for something to fail, and it fails, that should trigger the test to pass.
If the program passes in a place where it should be failing, the test for that portion of code should fail.
In summary, a program is not working properly unless all tests are passing.
You should never have failing tests, as others have pointed out, this defeats the purpose of automation. What you might want are tests that verifies your code works as expected when inputs are incorrect. Looking at your examples Should_Fail_When_UsageQuantityIsNegative() is a test that should pass, but the assertions you make depend on what fail means. For example, if your code should throw an IllegalArgumentException when usage quantity is negative then you might have a test like this:
#Test(expected = IllegalArgumentException.class)
public void Should_Fail_When_UsageQuantityIsNegative() {
// code to set usage quantity to a negative value
}
There's a few different ways to interpret the question if tests should fail.
A test like Should_Fail_When_UsageQuantityMoreThan50() should instead be a passing test which checks the appropriate error is thrown. Throws_Exception_When_UsageQuantityMoreThan50() or the like. Many test suites have special facilities for testing exceptions: JUnit's expected parameter and Perl modules such as Test::Exception and can even test for warnings.
Tests should fail during the course of development, it means they're doing their job. You should be suspicious of a test suite which never fails, it probably has bad coverage. The failing tests will catch changes to public behavior, bugs, and other mistakes by the developer or the tests or the code. But when committed and pushed, the tests should be returned to passing.
Finally, there are legitimate cases where you have a known bug or missing feature which cannot at this time be fixed or implemented. Sometimes bugs are incidentally fixed, so it's good to write a test for it. When it passes, you know the bug has been fixed, and you want a notice when it starts passing. Different testing systems allow you to write tests which are expected to fail, and will only be visible if they pass. In Perl this is the TODO or expected failure. POSIX has a number of results such as UNRESOLVED, UNSUPPORTED and UNTESTED to cover this case.
We use several AST transforms in our groovy code, such as #ToString and #EqualsAndHashCode. We use these so we don't have to maintain and test them. The problem is that code coverage metrics (using jacoco right now but open to change if it will help) don't know these are autogenerated methods and they cause a lot of code to appear uncovered even though it's not actually code we're writing.
Is there a way to include these from coverage metrics in any tools?
I guess you could argue that since we're putting the annotations we should still be testing the code being generated since a unit test shouldn't care how these methods are created, but just that they work.
I had a similar issue with #Log and the conditionals that it inserts into the code. That gets reported (cobertura) as a lack of branch coverage.
But as you said: it just reports it correctly. The code is not covered.
If you don't need the code, you should not have generated it. If you need it and aim for full test coverage, you must test it or at least "exercise" it, i.e. somehow use it from your test cases even without asserts.
From a test methodology standpoint, not covering generated code is equally questionable as using exclusion patterns. From a pragmatic standpoint, you may just want to live with it.
I'm preparing some educational/training material with respect to Unit Testing, and want to double check some vocabulary.
In an example I'm using the developer has tested a Facade for all possible inputs but hasn't tested the more granular units 'behind' it.
Would one still say that the tests have "full coverage" - given they cover the entire range of inputs? I feel like "full coverage" is generally used to denote coverage of code/units... but there would certainly be full something when testing all possible inputs.
What's the other word I'm looking for?
If all possible inputs don't give you 100% code coverage, you have 100% scenario coverage, but not complete code coverage.
On that note, if you have 100% scenario coverage without full code coverage, you have dead code and you should think real hard about why it exists.
If you decide to use 'full coverage' then you might have troubles because most literature that speaks about coverage (and indeed, the tools that measure coverage as well) talk about the lines of code that are executed in the code under test after all tests are run.
The test cases that you propose would be said to cover the domain of the function (and assuming a function that is at least 1-to-1, they will cover the range as well).
it is full code coverage of the involved classes, but clearly not of the full system source. Tools can give this at different level.
note that it doesn't guarantee the code is correct, as it might have missed an scenario that needs to be handled altogether (both in test and feature code). Additionally the tests could trigger all code paths and not be asserting correctly.