Should failing tests make the continuous build fail? - unit-testing

If one has a project that has tests that are executed as part of the build procedure on a build machine, if a set tests fail, should the entire build fail?
What are the things one should consider when answering that question? Does it matter which tests are failing?
Background information that prompted this question:
Currently I am working on a project that has NUnit tests that are done as part of the build procedure and are executed on our cruise control .net build machine.
The project used to be setup so that if any tests fail, the build fails. The reasoning being if the tests fail, that means the product is not working/not complete/it is a failure of a project, and hence the build should fail.
We have added some tests that although they fail, they are not crucial to the project (see below for more details). So if those tests fail, the project is not a complete failure, and we would still want it to build.
One of the tests that passes verifies that incorrect arguments result in an exception, but the test does not pass is the one that checks that all the allowed arguments do not result in an exception. So the class rejects all invalid cases, but also some valid ones. This is not a problem for the project, since the rejected valid arguments are fringe cases, on which the application will not rely.

If it's in any way doable, then do it. It greatly reduces the broken-window-problem:
In a system with no (visible) flaws, introducing a small flaw is usually seen as a very bad idea. So if you've got a project with a green status (no unit test fails) and you introduce the first failing test, then you (and/or your peers) will be motivated to fix the problem.
If, on the other side, there are known-failing tests, then adding just another broken test is seen as keeping the status-quo.
Therefore you should always strive to keep all tests running (and not just "most of them"). And treating every single failing test as reason for failing the build goes a long way towards that goal.

If a unit test fails, some code is not behaving as expected. So the code shouldn't be released.
Although you can make the build for testing/bugfixing purposes.

If you felt that a case was important enough to write a test for, then if that test is failing, the software is failing. Based on that alone, yes, it should consider the build a failure and not continue. If you don't use that reasoning, then who decides what tests are not important? Where is the line between "if this fails it's ok, but if that fails it's not"? Failure is failure.

I think a nice setup like yours should always build successfully, including all unit tests passed.
Like Gamecat said, the build itself is succeeded, but this code should never go to production.
Imagine one of your team members introducing a bug in the code which that one unit test (which always fails) covers. It won't be discovered by the test since you allow that one test to always fail.
In our team we have a simple policy: if all tests don't pass, we don't go to production with the code. This is also a very simple to understand for our project manager.

In my opinion it really depends on your Unit Tests,...
if your Unit tests are really UNIT tests (like they should be => "reference to endless books ;)" )
then the build should fail, because something is not behaving as should...
but most often (unfortunately to often seen), in so many projects these unit tests only cover some 'edges' and/or are integration tests, then the build should go on
(yes, this is a subjective answer ;)
in short:
do you know the unit tests to be fine: fail;
else: build on

The real problem is with your failing tests. You should not have a unit test where it's OK to fail because it's an edge case. Decide whether the edge case is important or not - if not then delete the unit test; if yes then fix the code.
Like some of the other answers implied, it's definitely a code smell when unit tests fail. If you live with the smell, then you're less likely to spot the next problem

All the answers have been great, here is what I decided to do:
Make the tests (or if need be split a failing test) that are not crucial be ignored by NUnit (I remembered this feature after asking the question). This allows:
The build can fail if any tests fail, hence reducing the smelliness
The tests that are ignored have to be defended to project manager (whomever is in charge)
Any tests that are ignored are marked in a special way
I think that is the best compromise, forcing people to fix the tests, but not necessarily right away (but they have to defend their decision of not fixing it now since everyone knows what they did).
What I actually did: fixed the broken tests.

Related

Should I have failing tests?

Please note: I'm not asking for your opinion. I'm asking about conventions.
I was just wondering whether I should have both passing and failing tests with appropriate method names such as, Should_Fail_When_UsageQuantityIsNegative() , Should_Fail_When_UsageQuantityMoreThan50() , Should_Pass_When_UsageQuantityIs50().
Or instead, should I code them to pass and keep all the tests in Passed condition?
When you create unit tests, they should all pass. That doesn't mean that you shouldn't test the "failing" cases. It just means that the test should pass when it "fails."
This way, you don't have to go through your (preferably) large number of tests and manually check that the correct ones passed and failed. This pretty much defeats the purpose of automation.
As Mark Rotteveel points out in the comments, just testing that something failed isn't always enough. Make sure that the failure is the correct failure. For example, if you are using error codes and error_code being equal to 0 indicates a success and you want to make sure that there is a failure, don't test that error_code != 0; instead, test for example that error_code == 19 or whatever the correct failing error code is.
Edit
There is one additional point that I would like to add. While the final version of your code that you deploy should not have failing tests, the best way to make sure that you are writing correct code is to write your tests before you write the rest of the code. Before making any change to your source code, write a unit test (or ideally, a few unit tests) that should fail (or fail to compile) now, but pass after your change has been made. That's a good way to make sure that the tests that you write are testing the correct thing. So, to summarize, your final product should not have failing unit tests; however, the software development process should include periods where you have written unit tests that do not yet pass.
You should not have failing tests unless your program is acting in a way that it is not meant to.
If the intended behavior of your program is for something to fail, and it fails, that should trigger the test to pass.
If the program passes in a place where it should be failing, the test for that portion of code should fail.
In summary, a program is not working properly unless all tests are passing.
You should never have failing tests, as others have pointed out, this defeats the purpose of automation. What you might want are tests that verifies your code works as expected when inputs are incorrect. Looking at your examples Should_Fail_When_UsageQuantityIsNegative() is a test that should pass, but the assertions you make depend on what fail means. For example, if your code should throw an IllegalArgumentException when usage quantity is negative then you might have a test like this:
#Test(expected = IllegalArgumentException.class)
public void Should_Fail_When_UsageQuantityIsNegative() {
// code to set usage quantity to a negative value
}
There's a few different ways to interpret the question if tests should fail.
A test like Should_Fail_When_UsageQuantityMoreThan50() should instead be a passing test which checks the appropriate error is thrown. Throws_Exception_When_UsageQuantityMoreThan50() or the like. Many test suites have special facilities for testing exceptions: JUnit's expected parameter and Perl modules such as Test::Exception and can even test for warnings.
Tests should fail during the course of development, it means they're doing their job. You should be suspicious of a test suite which never fails, it probably has bad coverage. The failing tests will catch changes to public behavior, bugs, and other mistakes by the developer or the tests or the code. But when committed and pushed, the tests should be returned to passing.
Finally, there are legitimate cases where you have a known bug or missing feature which cannot at this time be fixed or implemented. Sometimes bugs are incidentally fixed, so it's good to write a test for it. When it passes, you know the bug has been fixed, and you want a notice when it starts passing. Different testing systems allow you to write tests which are expected to fail, and will only be visible if they pass. In Perl this is the TODO or expected failure. POSIX has a number of results such as UNRESOLVED, UNSUPPORTED and UNTESTED to cover this case.

nUnit Test interdependencies/hierarchies

Is there a way to give nUnit a hierarchy of tests, so that later test aren't run if earlier test fail?
Suppose I have a set of data that is expected to be processed and ordered, and I’ve got a UT that tests that the ordering is done, but also a bunch other tests that test the processing and which assume that the ordering has been successful and will hence fail spuriously if that’s false.
(And suppose that breaking this dependency is not feasible - I know that is the ideal solution!)
Then it would be nice to tell nUnit … "If the ordering test has failed, then don’t even bother with these ones (throw inconclusive, maybe?) ‘cos we know they ain’t gonna work."
Obviously I could do it manually, but that's going to add a lot of essentially irrelevant cruft to the tests (we have nice short tests, so a "call the head test and catch Exceptions" block would double the length of some tests)
I'm hoping that nUnit might JustDoThis already, but I can’t see it anywhere?
As mentioned in your question, the most 'proper' fix is to break the dependency between the tests. If you really want to do this though, NUnit does support assumptions (which are a bit like assertions, but at the start of your test), which could be used to check for some precondition that affects the validity of your test. Searching for "NUnit Assume" brought up the following StackOverflow question, which looks similar to this one: How do I ignore a test based on another test in NUnit?
I believe hierarchy in unit tests is a bad idea. As you said your unit tests are small, nice and independent of course. And it should stay the same. Adding some dependencies breaks it.
To offer some workaround, I would suggest to use the Category attribute combined with the /stoponerror option. You could group tests by its importance and run it by this categories. And use stoponerror option will stop executing the rest of tests when the first fails. I'm not sure if this also works in integration with Visual Studio but you can achieve this with NUnit command line tool. See this

Should my unit tests for existing methods fail?

In TDD or BDD, we start from failing our unit tests, then fix the methods under test to let the unit tests pass.
Often times, at a new job, we need to write unit tests for existing methods. Probably not a good practice, but this does happen. That's the situation I am in now.
So, here is my question: Should I let my unit tests for existing methods fail? Thank you.
You're not dealing with TDD when you're adding unit tests for working code. However, it is still a good idea to make the tests fail when you first write them (for example, in an extremely simple case, if the actual output will be abc, you write the test to expect abd) so that you know that the tests do fail when the output is different from what the test says is expected. Once you've proved that the tests can fail, you can make them pass by fixing the expected output.
The worst situation to be in is to add unit tests to working code that pass when they're first written. Then you modify the code, changing the output, and the unit tests still pass — when they shouldn't. So, make sure your new unit tests do detect problems — which really does mean writing them so they fail at first (but that may mean you have to write them with known-to-be-bogus expected results).
No, you should not try to make your tests fail.
Why does TDD make the test fail first?
The first reason is to ensure that you really are writing your tests before your code. If you write a test and it passes right away, you are writing code before your test. We write failing tests to be sure that we really are writing the tests first. In your case, its too late for that, so this reason doesn't apply.
A second reason is to verify that the tests are correct. The danger of a test is that it will not be testing the functionality that you think it is. Having the test fail for the correct reason gives confidence that the test is actually working. However, you cannot have the test fail for the correct reason. The code works, and the test is supposed to detect whether or not the good is working. So there is no way to write a test that actually fails for the correct reason.
You can, as the other answer suggested, write test code that is wrong, watch if fail and then correct it. But that fails because the test is wrong, its not really showing that your test correctly catches actual errors. At best it really shows that your assertions work. But generally we are pretty confident that the assertions work, and we don't need to constantly retest our assertions functions.
I don't think you gain much by trying to get your tests to fail when you are adding tests to already working code. So I wouldn't do it.

Version control and test-driven development

The standard process for test-driven development seems to be to add a test, see it fail, write production code, see the test pass, refactor, and check it all into source control.
Is there anything that allows you to check out revision x of the test code, and revision x-1 of the production code, and see that the tests you've written in revision x fail? (I'd be interested in any language and source control system, but I use ruby and git)
There may be circumstances where you might add tests that already pass, but they'd be more verification than development.
A couple of things:
After refactoring the test, you run the test again
Then, you refactor the code, then run the test again
Then, you don't have to check in right away, but you could
In TDD, there is no purpose in adding a test that passes. It's a waste of time. I've been tempted to do this in order to increase code coverage, but that code should have been covered by tests which actually failed first.
If the test doesn't fail first, then you don't know if the code you then add fixes the problem, and you don't know if the test actually tests anything. It's no longer a test - it's just some code that may or may not test anything.
Simply keep your tests and code in seperate directories, and then you can check out one version of the tests and another of the code.
That being said, in a multi-developer environment you generally don't want to be checking in code where the tests fail.
I would also question the motivation for doing this? If it is to "enforce" the failing test first, then I would point you to this comment from the father of (the promotion of) TDD.
"There may be circumstances where you might add tests that already pass, but they'd be more verification than development."
In TDD you always watch a test fail before making it pass so that you know it works.
As you've found, sometimes you want to explicitly describe behaviour that is covered by code you've already written but when considered from outside the class under test is a separate feature of the class. In that case the test will pass.
But, still watch the test fail.
Either write the test with an obviously failing assertion and then fix the assertion to make it pass. Or, temporarily break the code and watch all affected tests fail, including the new one. And then fix the code to make it work again.
If you keep your production and test code in separate versioning areas (e.g. separate projects/source trees/libraries/etc.), most version control systems allow you to checkout previous versions of code and rebuild them. In your case, you could checkout the x-1 version of production code, rebuild it, then run your test code against the newly built/deployed production deployable.
One thing that might help would be to tag/label all of your code when you do a release, so that you can easily fetch an entire source tree for a previous version of your code.
Is there anything that allows you to
check out revision x of the test code,
and revision x-1 of the production
code, and see that the tests you've
written in revision x fail?
I think that you are looking for the keyword continuous integration. There are many tools that actually are implemented as post-commit hooks in version control systems (aka something that runs on servers/central repository after each commit): for example, they will run your unit tests after each commits, and email the committers if a revision introduces a regression.
Such tools are perfectly able to detect which tests are new and never passed from old tests that used to pass and that currently fail due to a recent commit, which means that using TDD and continuous integration altogether is just fine: you will probably be able to configure your tools not to scream when a new failing test is introduced, and to complain only on regressions.
As always, I'll direct you to Wikipedia for a generic introduction on the topic. And a more detailed, quite famous resource would be the article from Martin Fowler
If you git commit after writing your failing tests, and then again when they are passing, you should at a later time be able to create a branch at the point where the tests fail.
You can then add more tests, verify that they also fail, git commit, git merge and then run the tests with the current code base to see if the work you already did will cause the test to pass or if you now need to do some more work.

Testing a test?

I primarily spend my time working on automated tests of win32 and .NET applications, which take about 30% of our time to write and 70% to maintain. We have been looking into methods of reducing the maintenance times, and have already moved to a reusable test library that covers most of the key components of our software. In addition we have some work in progress to get our library to a state where we can use keyword based testing.
I have been considering unit testing our test library, but I'm wondering if it would be worth the time. I'm a strong proponent of unit testing of software, but I'm not sure how to handle test code.
Do you think automated Gui testing libraries should be unit tested? Or is it just a waste of time?
First of all I've found it very useful to look at unit-test as "executable specifications" instead of tests. I write down what I want my code to do and then implement it. Most of the benefits I get from writing unit tests is that they drive the implementation process and focus my thinking. The fact that they're reusable to test my code is almost a happy coincidence.
Testing tests seems just a way to move the problem instead of solving it. Who is going to test the tests that test the tests? The 'trick' that TDD uses to make sure tests are actually useful is by making them fail first. This might be something you can use here too. Write the test, see it fail, then fix the code.
I dont think you should unit test your unit tests.
But, if you have written your own testing library, with custom assertions, keyboard controllers, button testers or what ever, then yes. You should write unit tests to verify that they all work as intented.
The NUnit library is unit tested for example.
In theory, it is software and thus should be unit-tested. If you are rolling your own Unit Testing library, especially, you'll want to unit test it as you go.
However, the actual unit tests for your primary software system should never grow large enough to need unit testing. If they are so complex that they need unit testing, you need some serious refactoring of your software and some attention to simplifying your unit tests.
You might want to take a look at Who tests the tests.
The short answer is that the code tests the tests, and the tests test the code.
Huh?
Testing Atomic Clocks
Let me start with an analogy. Suppose you are
travelling with an atomic clock. How would you know that the clock is
calibrated correctly?
One way is to ask your neighbor with an atomic clock (because everyone
carries one around) and compare the two. If they both report the same
time, then you have a high degree of confidence they are both correct.
If they are different, then you know one or the other is wrong.
So in this situation, if the only question you are asking is, "Is my
clock giving the correct time?", then do you really need a third clock
to test the second clock and a fourth clock to test the third? Not if
all. Stack Overflow avoided!
IMPO: it's a tradeoff between how much time you have and how much quality you'd like to have.
If I would be using a home made test harnas, I'd test it if time permits.
If it's a third party tool I'm using, I'd expect the supplier to have tested it.
There really isn't a reason why you could/shouldn't unit test your library. Some parts might be too hard to unit test properly, but most of it probably can be unit tested with no particular problem.
It's actually probably particularly beneficial to unit test this kind of code, since you expect it to be both reliable and reusable.
The tests test the code, and the code tests the tests. When you say the same intention in two different ways (once in tests and once in code), the probability of both of them being wrong is very low (unless already the requirements were wrong). This can be compared to the dual entry bookkeeping used by accountants. See http://butunclebob.com/ArticleS.UncleBob.TheSensitivityProblem
Recently there has been discussion about this same issue in the comments of http://blog.objectmentor.com/articles/2009/01/31/quality-doesnt-matter-that-much-jeff-and-joel
About your question, that should GUI testing libraries be tested... If I understood right, you are making your own testing library, and you want to know if you should test your testing library. Yes. To be able to rely on the library to report tests correctly, you should have tests which make sure that library does not report any false positives or false negatives. Regardless of whether the tests are unit tests, integration tests or acceptance tests, there should be at least some tests.
Usually writing unit tests after the code has been written is too late, because then the code tends to be more coupled. The unit tests force the code to be more decoupled, because otherwise small units (a class or a closely related group of classes) can not be tested in isolation.
When the code has already been written, then usually you can add only integration tests and acceptance tests. They will be run with the whole system running, so you can make sure that the features work right, but covering every corner case and execution path is harder than with unit tests.
We generally use these rules of thumb:
1) All product code has both unit tests (arranged to correspond closely with product code classes and functions), and separate functional tests (arranged by user-visible features)
2) Do not write tests for 3rd party code, such as .NET controls, or third party libraries. The exception to this is if you know they contain a bug which you are working around. A regression test for this (which fails when the 3rd party bug disappears) will alert you when upgrades to your 3rd party libraries fix the bug, meaning you can then remove your workarounds.
3) Unit tests and functional tests are not, themselves, ever directly tested - APART from using the TDD procedure of writing the test before the product code, then running the test to watch it fail. If you don't do this, you will be amazed by how easy it is to accidentally write tests which always pass. Ideally, you would then implement your product code one step at a time, and run the tests after each change, in order to see every single assertion in your test fail, then get implemented and start passing. Then you will see the next assertion fail. In this way, your tests DO get tested, but only while the product code is being written.
4) If we factor out code from our unit or functional tests - creating a testing library which is used in many tests, then we do unit test all of this.
This has served us very well. We seem to have always stuck to these rules 100%, and we are very happy with our arrangement.
Kent Beck's book "Test-Driven Development: By Example" has an example of test-driven development of a unit test framework, so it's certainly possible to test your tests.
I haven't worked with GUIs or .NET, but what concerns do you have about your unit tests?
Are you worried that it may describe the target code as incorrect when it is functioning properly? I suppose this is a possibility, but you'd probably be able to detect that if this was happening.
Or are you concerned that it may describe the target code as functioning properly even if it isn't? If you're worried about that, then mutation testing may be what you're after. Mutation testing changes parts of code being tested, to see if those changes cause any tests to fail. If it doesn't, then either the code isn't being run, or the results of that code isn't being tested.
If mutation testing software isn't available on your system, then you could do the mutation manually, by sabotaging the target code yourself and seeing if it causes the unit tests to fail.
If you're building a suite of unit testing products that aren't tied to a particular application, then maybe you should build a trivial application that you can run your test software on and ensure it gets the failures and successes expected.
One problem with mutation testing is that it doesn't ensure that the tests cover all potential scenarios a program may encounter. Instead, it only ensures that the scenarios anticipated by the target code are all tested.
Answer
Yes, your GUI testing libraries should be tested.
For example, if your library provides a Check method to verify the contents of a grid against a 2-dimensional array, you want to be sure that it works as intended.
Otherwise, your more complex test cases that test business processes in which a grid must receive particular data may be unreliable. If an error in your Check method produces false negatives, you'll quickly find the problem. However, if it produces false positives, you're in for major headaches down the line.
To test your CheckGrid method:
Populate a grid with known values
Call the CheckGrid method with the values populated
If this case passes, at least one aspect of CheckGrid works.
For the second case, you're expecting the CheckGrid method to report a test failure.
The particulars of how you indicate the expectation will depend on your xUnit framework (see an example later). But basically, if the Test Failure is not reported by CheckGrid, then the test case itself must fail.
Finally, you may want a few more test case for special conditions, such as: empty grids, grid size mismatching array size.
You should be able to modify the following dunit example for most frameworks in order to test that CheckGrid correctly detects errors:
begin
//Populate TheGrid
try
CheckGrid(<incorrect values>, TheGrid);
LFlagTestFailure := False;
except
on E: ETestFailure do
LFlagTestFailure := True;
end;
Check(LFlagTestFailure, 'CheckGrid method did not detect errors in grid content');
end;
Let me reiterate: your GUI testing libraries should be tested; and the trick is - how do you do so effectively?
The TDD process recommends that you first figure out how you intend testing a new piece of functionality before you actually implement it. The reason is, that if you don't, you often find yourself scratching your head as to how you're going to verify it works. It is extremely difficult to retrofit test cases onto existing implementations.
Side Note
One thing you said bothers me a little... you said it takes "70% (of your time) to maintain (your tests)"
This sounds a little wrong to me, because ideally your tests should be simple, and should themselves only need to change if your interfaces or rules change.
I may have misunderstood you, but I got the impression that you don't write "production" code. Otherwise you should have more control over the cycle of switching between test code and production code so as to reduce your problem.
Some suggestions:
Watch out for non-deterministic values. For example, dates and artificial keys can play havoc with certain tests. You need a clear strategy of how you'll resolve this. (Another answer on its own.)
You'll need to work closely with the "production developers" to ensure that aspects of the interfaces you're testing can stabilise. I.e. They need to be cognisant of how your tests identify and interact with GUI components so they don't arbitrarily break your tests with changes that "don't affect them".
On the previous point, it would help if automated tests are run whenever they make changes.
You should also be wary of too many tests that simply boil down to arbitrary permutations. For example, if each customer has a category A, B, C, or D; then 4 "New Customer" tests (1 for each category) gives you 3 extra tests that don't really tell you much more than the first one, and are 'hard' to maintain.
Personally, I don't unit test my automation libraries, I run them against a modified version of the baseline to ensure all the checkpoints work. The principal here is that my automation is primarily for regression testing, e.g. that the results for the current run are the same as the expect results (typically this equates to the results of the last run). By running the tests against a suitably modified set of expected results, all the tests shoud fail. If they don't you have a bug in your test suite. This is a concept borrowed from mutation testing that I find works well for checking GUI automation suites.
From your question, I can understand that you are building a Keyword Driven Framework for performing automation testing. In this case, it is always recommended to do some white box testing on the common and GUI utility functions. Since you are interested in Unit testing each GUI testing functionality in your libraries, please go for it. Testing is always good. It is not a waste of time, I would see it as a 'value-add' to your framework.
You have also mentioned about handling test code, if you mean the test approach, please group up different functions/modules performing similar work eg: GUI element validation (presence), GUI element input, GUI element read. Group for different element types and perform a type unit test approach for each group. It would be easier for you to track the testing. Cheers!
I would suggest test the test is a good idea and something that must be done. Just make sure that what you're building to test your app is not more complex that the app itself. As it was said before, TDD is a good approach even when building automated functional tests (I personally wouldn't do it like that, but it is a good approach anyway). Unit testing you test code is a good approach as well. IMHO, if you're automating GUI testing, just go ahead with whatever manual tests are available (you should have steps, raw scenarios, expected results and so on), make sure they pass. Then, for other test that you might create and that are not already manually scripted, unit test them and follow a TDD approach. (then if you have time you could unit test the other ones).
Finally, keyword driven, is, IMO, the best approach you could follow because it gives you the most flexible approach.
You may want to explore a mutation testing framework ( if you work with Java : check out PIT Mutation Testing ). Another way to assess the quality of your unit testing is to look at reports provided by tools such as SonarQube ; the reports include various coverage metrics;