Integration test fails all unit tests succeed, whats the conclusion?

Integration test fails all unit tests succeed, whats the conclusion? - unit-testing

I have a kind of general question
I wrote several unit tests for a bunch of connected classes and all unit tests are succeeding.
However, after implementing my first integration tests I got some errors because I didn't take some shared behavior between the classes into account.
My question is:
Am I supposed to fix my unit tests because they failed to find the bugs? Are unit tests not supposed to take the other classes into account and I just bugfix my productive code because it is sufficient that the integration test cover such things.
EDIT:
due to the responses it seems necessary to specify my problem further. I am still not sure which test is responsible for what.
Lets assume I have a class(lets call it ListClass) which contains a list of objects, an index, a getCurrentValue, and a incrementIndex function. I guess the function names speak for themselves.
The index starts with -1 and if you call getCurrentValue with -1 it returns you null
I also have a reset function which sets the start index back to -1.
Okay my tests have 100% code coverage, I created mutants which were detected by my tests, so everthing seems fine.
My second class is a handler which is responsible to set all Permutations everytime i call Handler.Next().
// listContainer1.index = -1, listContainer2.index = -1,
handler.add(listContainer1) //has 2 values
handler.add(listContainer2) //has 2 values
handler.next() // listContainer1.index = 0, listContainer2.index = 0
handler.next() // listContainer1.index = 0, listContainer2.index = 1
handler.next() // listContainer1.index = 1, listContainer2.index = 0
handler.next() // listContainer1.index = 1, listContainer2.index = 1
Again 100% code coverage, mutants were detected (I used mocking to define the behaviour of the listContainer)
Here is the Integration test which broke the code:
Calling reset on a listContainer, after several handler.next() calls
Because the container index was -1 the next handler.next() call resulted in an not expected state.
How can I predict such an outcome in an unit test in my handler without being depent on my ListClass? Or as I asked in my original post .. Is it sufficient if the integration test catches the error.
For me it seems both unit test classes covered their own responsibilities ...

First you have to determine where the problems are at.
Are your unit tests relevant? This includes:
Do they have meaningful asserts?
How much of your code is covered by unit tests?
Do you test if it works in valid conditions and fails in invalid?
You can only fix your unit tests if there is actually something wrong with them. Typically though you built your tests in such a manner that you should be able to depend on them. Considering you feel like you can't, you should make that your first priority (if you can't trust them, then they are essentially worthless).
Considering the problems occurred when you started with integration tests your first train of thought (assuming sufficient unit tests) should be to verify the integrating works well. If an external dependency goes wrong somewhere then it is not necessarily a programmatic error but perhaps a configuration that isn't properly setup (like a database connection).
Since your situation seems to deal with classes interrupting eachothers behaviour, I believe you'll have to revisit your test design because you have likely made some mistakes there. Even though the unit tests themselves work, you have to make sure that they are isolated but you should also be aware that a unit might be more than just one method or class (more on that here).
One example of something you should look out for are state changes, typically something that happens as a result of a void method but certainly possible as a side-effect of any type of action. This might interfere with assumptions you make in other tests.
Because of the general nature of the question I have kept the answer pretty general as well but if you'd share some specifics then I could provide a more specific answer.
In short: bad unit tests are unreliable and thus useless unit tests; make sure they are sufficiently relevant to your application.

Unit tests check that each part is working properly. Integration tests check that the parts fit together. If either one of them could cover everything you wouldn't need the other.
Still - eventually you would need to change your unit tests, not because they are wrong but because if the components of your program don't fit together you'll need to alter the interface behavior(or code-that-using-other-interface) of some of them, which means you'll need to change some unit tests.
For example, if Foo depends on Bar, both pass unit tests, but an integration test that checks them both fails, that means there is something wrong in the way Foo uses Bar. This means that either(or both!):
You'll have to change the behavior of Bar. This means some Bar unit tests are no longer valid(since they check the old behavior) and you need to change them too.
You'll need to change the way Foo uses Bar. This can break some Foo unit tests where you send to Foo a mock of Bar. That mock was created with the old way Foo was using Bar in mind - you'll need to change it to match the new, correct way.
Still - this does not mean the new unit tests will be able to catch integration problems!

You called "reset" on the listContainer, then the handler was in a not expected state?
When you say "not expected state" this could mean your test is wrong (the state was just externally changed by the test, but the test expects something else, so it is clearly a bad test).
If the handler was corrupted by outside code changing its data, making it out of sync with another internal index maintained by the handler, then the integration test violated encapsulation. You can't unit test if they are not units.
If the handler is one of many classes accessing the listContainer then it has to use the listContainer's interface. It can't have a bad state if it just asks the listContainer what its state is. In this case the handler is entirely at fault, and you need to add a new unit test to the handler. You can't anticipate everything but when you find a problem, you can add a unit test which detects it, then fix the code so the test passes.

Related

Can I make a unit test inconclusive if a requisite unit test fails?

Consider unit testing a dictionary object. The first unit tests you might write are a few that simply adds items to the dictionary and check exceptions. The next test may be something like testing that the count is accurate, or that the dictionary returns a correct list of keys or values.
However, each of these later cases requires that the dictionary can first reliably add items. If the tests which add items fail, we have no idea whether our later tests fail because of what they're testing is implemented incorrectly, or because the assumption that we can reliably add items is incorrect.
Can I declare a set of unit tests which cause a given unit test to be inconclusive if any of them fail? If not, how should I best work around this? Have I set up my unit tests wrong, that I'm running into this predicament?

It's not as hard as it might seem. Let's rephrase the question a bit:
If I test my piece of code which requires System.Collections.Generic.List<T>.Add to work, what should I do when one day Microsoft decides to break .Add on List<T>? Do I make my tests depending on this to work inconclusive?
Answer to the above is obvious; you don't. You let them fail for one simple reason - your assumptions have failed, and test should fail. It's the same here. Once you get your add tests to work, from that point on you assume add works. You shouldn't treat your tested code any differently than 3rd party tested code. Once it's proven to work, you assume it indeed does.
On a different note, you can utilize concept called guard assertions. In your remove test, after the arrange phase you introduce additional assert phase, which verifies your initial assumptions (in this case - that the add is working). More information about this technique can be found here.
To add an example, NUnit uses the concept above disguised under the name Theory. This does exactly what you proposed (yet it seems to be more related to data driven testing rather than general utility):
The theory itself is responsible for ensuring that all data supplied meets its assumptions. It does this by use of the Assume.That(...) construct, which works just like Assert.That(...) but does not cause a failure. If the assumption is not satisfied for a particular test case, that case returns an Inconclusive result, rather than a Success or Failure.
However, I think what Mark Seemann states in an answer to the question I linked makes the most sense:
There may be many preconditions that need to be satisfied for a given test case, so you may need more than one Guard Assertion. Instead of repeating those in all tests, having one (and one only) test for each precondition keeps your test code more mantainable, since you will have less repetition that way.

Nice question, I often ponder this and had this problem the other day. What I did was get the basics of our collection working using a dictionary behind the scenes. For example:
public class MyCollection
{
private IDictionary<string, int> backingStore;
public MyCollection(IDictionary<string, int> backingStore)
{
_backingStore = backingStore;
}
}
Then we test drove the addition implementation. As we had the dictionary by reference we could assert that after adding items our business logic was correct.
For example the pseudo code for the additon was something like:
public void Add(Item item)
{
// Check we have not added before
// More business logic...
// Add
}
Then the test could be written such as:
var subject = new MyCollection(backingStore);
subject.Add(new Item())
Assert.That(backingStore.Contains(itemThatWeAdded)
We then went on to drive out the other methods such as retrieval, and deletion.
Your question is what should you do with regards the addition breaking, in turn breaking the retrieval. This is a catch 22 scenario. Personally I'd rather ditch the backing store and use this as an implementation detail. So this is what we did. We refactored the tests to use the system under test, rather than the backing store for the asserts. The great thing about the backing store being public initially is it allows you test drive small parts of the codebase, rather than having to implement both addition and retrieval in one go.
The test for addition then looked like the following after we refactored the collection to not expose the backing store.
var subject = new MyCollection();
var item = new Item()
subject.Add(item)
Assert.That(subject.Has(item), Is.True);
In this case I think this is fine. If you can not add items successfully then you sure as hell cannot retrieve anything because you've not added them. As long as your tests are named well any developer seeing some test such as "CanOnlyAddUniqueItemsToCollection" will point future developers in the right direction, in other words, the addition is broken. Just make sure your tests are named well and you should be giving as much help as possible.

I don't see this as too much of a problem. If your Dictionary class is not too big, and the unit test for that class is the only unit test testing that code, then when your add method is broken and multiple tests fail, you still know the problem is in the Dictionary class and can identify it, debug and fix it easily.
Where it becomes a problem is when you have other code smells or design problems such as:
unit tests tests are testing many application classes, using mocks instead can help here.
unit tests are actually system tests creating and testing many application classes at once.
the Dictionary class is too big and complex so when it breaks and tests fail it's difficult to figure out what part is broken.

This is very interesting. We use NUnit and the best I can tell it runs test-methods alphabetically. That might be an overly-artificial way to order your tests, but if you built up your test classes such that alphabetically/numerically-named pre-req methods came first you might accomplish what you want.
I find myself writing a test method, firing just it to watch it fail, and then writing the code to make it pass. When I'm all done I can run the whole class and everything passes - it doesn't matter what order the tests ran in becuase everything 'works' becuase of the incremental dev I did.
Now later on if I break something in the thing i'm testing who knows what all will fail in the harness. I guess it doesn't really matter to me - I've got a long list of failures and I can tease out what went wrong.

Unit test and ordered tests - best practice

I've got a complex part of a web application, and we're starting now to unit test it in order to ensure that everything works fine, and if any changes will be made, the tests will be there to check whether we broke anything.
This portion of our app is a sort of wizard: You go from step 1 to step N. Each step can fork in different ways depending on what the user chooses. Each step can also contain either only 1 item or a collection of items, like this:
Step 1: Are there items of type X? If yes, how many?
For each item declared -> form to input item data
Step 2: Are there items of type Y? If yes, how many?
For each item declared -> form to input item data (may contain references to items declared in step 1)
etc. It's not all like this, there are exceptions, but it's just to give an idea of how it is. Now this procedure isn't forward-only. The user must be able to jump back to previous nodes and apply changes, add items or remove items from collections, etc. and the software must remember what was the last step he completed before jumping back, so when he's done he can go on.
For unit testing purposes, I am thinking that I can't have standalone tests: if you haven't completed previous steps, you don't have the data for successive steps. Thus I was thinking of writing ordered tests.
I also read that a best practice is to "have the test be independent from one another", and what I thought to do is going against this.
The sample tests I wrote are green if run as an ordered test, but only the first one is green if run as standalone tests.
Now I'd like to hear opinions and if anyone has a correct way to approach this situation.

Each test should indeed be independent of each other. If Test B depends on Test A executing before it, then you have potentially very flakey tests on your hands. In your situation, I'd prefer to SetUp Test B by pre-configuring a context.
What I mean is, whatever state Test A leaves the system after it has completed, use that to setup the context for Test B e.g.
public void TestB()
{
// Arrange.
SetupSystemLikeTestAHadJustRun();
// Act.
// Do your tests.
// Assert.
}
By having a known, fixed, context at the start of the test, you stand a much better chance of having a good suite of tests.
Alternatively, if you have heard of BDD (Behaviour Driven Development) you could a use a BDD tool like SpecFlow or NBehave (for .NET) or Cucumber for Ruby. Using BDD allows you to be more expressive in your testing.

Do's and Don'ts
DO
Name tests with both their expected outcome and relevant details of the state or input being tested
DON'T
Give tests names that say nothing beyond the name of the method being tested except in trivial cases
STRUCTURE
Structure tests in three distinct blocks - arrange, act, and assert.
Unit tests tend to have a very regular structure. A common way to refer to this structure is arrange, act, assert: every test must arrange the state of the world to test, act on the class under test by calling methods, and assert that the world is in the expected state afterward.
The arrange block is for setting up details of the external world specific to the situation under test. This involves things like creating local variables that will be reused in the test, and sometimes instantiating the object under test with specific arguments. This step should not involve any calls to the object under test (do that during the act block) or verifications of initial state (do that during assert, maybe in another test). Note that general setup required by all or many tests should be done in the test's setUp method. If your test doesn't depend on any specific external state, you can skip the arrange block.
The act block is where you actually make calls to the class under test to trigger the behavior that is being tested. Frequently this block will be a single method call, but if the behavior you're testing spans several methods then they will each be called here. Simple arguments can be inlined as part of the method call, but more complex argument expressions are sometimes better off extracted to the arrange block to avoid distracting from the intent of the block. The act block may also assign a method's return value to a local variable so that it can be asserted on later.
The assert block is the place to make assertions on the return values collected and to verify any interactions with mock objects. It can also build values required for the assertions and verifications. In very simple tests, the act and assert blocks are sometimes combined by inlining calls on the class under test into an assert statement.
These blocks should be distinct from one another - the test should not perform any additional setup or stubbing once it makes calls to the class under test in the act block, and it should not make further calls to the class under test once verification begins in the assert block.
It should be clear when glancing at the test where each block starts and ends. Usually this can be done by adding a single blank line between each block (though this isn't necessary in simple tests where the blocks are only one or two lines each). In particularly complex tests, especially ones where you have to set up several different objects, you might want to use blank lines within blocks to make them more readable. In this case, one option to distinguish the blocks is to label each with a comment like // Arrange, // Act, and // Assert.
Tests that emphasize this structure are clearer since they make it easy to navigate different parts of the test, and more likely to be complete since the regular structure helps ensure that the details of the behavior being tested aren't hidden or omitted.
Mocking frameworks interact with this structure in different ways. Most modern frameworks like Mockito allow stubs to be configured in the arrange block along with defining local variables, and mocks to be verified in the assert block along with performing assertions. Some older frameworks like EasyMock unfortunately require the expected behaviors of mocks to be specified before invoking the code under test - this requires a fourth "expect" block before the act block which works in a similar way to the assert block.

I would use a tool like cucumber or selenium to test the graphical stuff you describe. You can use a unit test framework like junit and nunit to write these kind of tests, but these doesn't really support running ordered tests.

What is the Pattern for Unit Testing flow control

I have a method that checks some assumptions and either follows the happy path, or terminates along the unhappy paths. I've either designed it poorly, or I'm missing the method for testing that the control of the flow.
if (this.officeInfo.OfficeClosed)
{
this.phoneCall.InformCallerThatOfficeIsClosedAndHangUp();
return;
}
if (!this.operators.GetAllOperators().Any())
{
this.phoneCall.InformCallerThatNoOneIsAvailableAndSendToVoicemail();
return;
}
Call call=null;
forach(var operator in this.operators.GetAllOperators())
{
call = operator.Call();
if(call!=null) {break;}
}
and so on. I've got my dependencies injected. I've got my mocks moq'd. I can make sure that this or that is called, but I don't know how to test that the "return" happens. If TDD means I don't write a line until I have a test that fails without it, I'm stuck.
How would you test it? Or is there a way to write it that makes it more testable?
Update: Several answers have been posted saying that I should test the resultant calls, not the flow control. The problem I have with this approach, is that every test is required to setup and test the state and results of the other tests. This seems really unwieldy and brittle. Shouldn't I be able to test the first if clause alone, and then test the second one alone? Do I really need to have a logarithmically expanding set of tests that start looking like Method_WithParameter_DoesntInvokeMethod8IfMethod7IsTrueandMethod6IsTrueAndMethod5IsTrueAndMethod4IsTrueAndMethod3IsFalseAndMethod2IsTrueAndMethod1isAaaaccck()?

I think you want to test the program's outputs: for example, that when this.officeInfo.OfficeClosed then the program does invoke this.phoneCall.InformCallerThatOfficeIsClosedAndHangUp() and does not invoke other methods such as this.operators.GetAllOperators().
I think that your test does this by asking its mock objects (phoneCall, etc.) which of their methods was invoked, or by getting them to throw an exception if any of their methods are invoked unexpectedly.
One way to do it is to make a log file of the program's inputs (e.g. 'OfficeClosed returns true') and outputs: then run the test, let the test generate the log file, and then assert that the contents of the generated log file match the expected log file contents for that test.

I'm not sure that's really the right approach. You care about whether or not the method produced the expected result, not necessarily how control "flowed" through the particular method. For example, if phoneCall.InformCallerThatOfficeIsClosedAndHangUp is called, then I assume some result is recorded somewhere. So in your unit test, you would be asserting that result was indeed recorded (either by checking a database record, file, etc.).
With that said, it's important to ensure that your unit tests indeed cover your code. For that, you can use a tool like NCover to ensure that all of your code is being excercised. It'll generate a coverage report which will show you exactly which lines were executed by your unit tests and more importantly, which ones weren't.

You could go ballistic and use a strategy pattern. Something along the lines of having an interface IHandleCall, with a single void method DoTheRightThing(), and 3 classes HandleOfficeIsClosed, HandleEveryoneIsBusy, HandleGiveFirstOperatorAvailable, which implement the interface. And then have code like:
IHandleCall handleCall;
if (this.officeInfo.OfficeClosed)
{
handleCall = new HandleOfficeIsClosed();
}
else if other condition
{
handleCall = new OtherImplementation();
}
handleCall.DoTheRightThing();
return;
That way you can get rid of the multiple return points in your method. Note that this is a very dirty outline, but essentially at that point you should extract the if/else into some factory, and then the only thing you have to test is that your class calls the factory, and that handleCall.DoTheRightThing() is called - (and of course that the factory returns the right strategy).
In any case, because you have already guarded against no operator available, you could simplify the end to:
var operator = this.operators.FindFirst();
call = operator.Call();

Don't test the flow control, just test the expected behavior. That is, unit testing does not care about the implementation details, only that the behavior of the method matches the specifications of the method. So if Add(int x, int y) should produce the result 4 on input x = 2, y = 2, then test that the output is 4 but don't worry about how Add produced the result.
To put it another way, unit testing should be invariant under implementation details and refactoring. But if you're testing implementation details in your unit testing, then you can't refactor without breaking the unit tests. For example, if you implement a method GetPrime(int k) to return the kth prime then check that GetPrime(10) returns 29 but don't test the flow control inside the method. If you implement GetPrime using the Sieve of Eratóstenes and have tested the flow control inside the method and later refactor to use the Sieve of Atkin, your unit tests will break. Again, all that matters is that GetPrime(10) returns 29, not how it does it.

If you are stuck using TDD it's a good thing: it means that TDD drives your design and you are looking into how to change it so you can test it.
You can either:
1) verify state: check SUT state after SUT execution or
2) verify behavior: check that mock object calls complied with test expectations
If you don't like how either of these approaches look in your test it's time to refactor the code.

The pattern described by Aaron Feng and K. Scott Allen would solve for my problem and it's testability concerns. The only issue I see is that it requires all the computation to be performed up front. The decision data object needs to be populated before all of the conditionals. That's great unless it requires successive round trips to the persistence storage.

Is this unit test excessive?

Given the following SUT, would you consider this unit test to be unnecessary?
**edit : we cannot assume the names will match, so reflection wouldn't work.
**edit 2 : in actuality, this class would implement an IMapper interface and there would be full blown behavioral (mock) testing at the business logic layer of the application. this test just happens to be the lowest level of testing that must be state based. I question whether this test is truly necessary because the test code is almost identical to the source code itself, and based off of actual experience I don't see how this unit test makes maintenance of the application any easier.
//SUT
public class Mapper
{
public void Map(DataContract from, DataObject to)
{
to.Value1 = from.Value1;
to.Value2 = from.Value2;
....
to.Value100 = from.Value100;
}
}
//Unit Test
public class MapperTest()
{
DataContract contract = new DataContract(){... } ;
DataObject do = new DataObject(){...};
Mapper mapper = new Mapper();
mapper.Map(contract, do);
Assert.AreEqual(do.Value1, contract.Value1);
...
Assert.AreEqual(do.Value100, contract.Value100);
}

i would question the construct itself, not the need to test it
[reflection would be far less code]

I'd argue that it is necessary.
However, it would be better as 100 separate unit tests, each that check one value.
That way, when you something go wrong with value65, you can run the tests, and immediately find that value65 and value66 are being transposed.
Really, it's this kind of simple code where you switch your brain off and forget about that errors happen. Having tests in place means you pick them up and not your customers.
However, if you have a class with 100 properties all named ValueXXX, then you might be better using an Array or a List.

It is not excessive. I'm sure not sure it fully focuses on what you want to test.
"Under the strict definition, for QA purposes, the failure of a UnitTest implicates only one unit. You know exactly where to search to find the bug."
The power of a unit test is in having a known correct resultant state, the focus should be the values assigned to DataContract. Those are the bounds we want to push. To ensure that all possible values for DataContract can be successfully copied into DataObject. DataContract must be populated with edge case values.
PS. David Kemp is right 100 well designed tests would be the most true to the concept of unit testing.
Note : For this test we must assume that DataContract populates perfectly when built (that requires separate tests).

It would be better if you could test at a higher level, i.e. the business logic that requires you to create the Mapper.Map() function.

Not if this was the only unit test of this kind in the entire app. However, the second another like it showed up, you'd see me scrunch my eyebrows and start thinking about reflection.

Not Excesive.
I agree the code looks strange but that said:
The beauty of unit test is that once is done is there forever, so if anyone for any reason decides to change that implementation for something more "clever" still the test should pass, so not a big deal.
I personally would probably have a perl script to generate the code as I would get bored of replacing the numbers for each assert, and I would probably make some mistakes on the way, and the perl script (or what ever script) would be faster for me.

Why should unit tests test only one thing?

What Makes a Good Unit Test? says that a test should test only one thing. What is the benefit from that?
Wouldn't it be better to write a bit bigger tests that test bigger block of code? Investigating a test failure is anyway hard and I don't see help to it from smaller tests.
Edit: The word unit is not that important. Let's say I consider the unit a bit bigger. That is not the issue here. The real question is why make a test or more for all methods as few tests that cover many methods is simpler.
An example: A list class. Why should I make separate tests for addition and removal? A one test that first adds then removes sounds simpler.

Testing only one thing will isolate that one thing and prove whether or not it works. That is the idea with unit testing. Nothing wrong with tests that test more than one thing, but that is generally referred to as integration testing. They both have merits, based on context.
To use an example, if your bedside lamp doesn't turn on, and you replace the bulb and switch the extension cord, you don't know which change fixed the issue. Should have done unit testing, and separated your concerns to isolate the problem.
Update: I read this article and linked articles and I gotta say, I'm shook: https://techbeacon.com/app-dev-testing/no-1-unit-testing-best-practice-stop-doing-it
There is substance here and it gets the mental juices flowing. But I reckon that it jibes with the original sentiment that we should be doing the test that context demands. I suppose I'd just append that to say that we need to get closer to knowing for sure the benefits of different testing on a system and less of a cross-your-fingers approach. Measurements/quantifications and all that good stuff.

I'm going to go out on a limb here, and say that the "only test one thing" advice isn't as actually helpful as it's sometimes made out to be.
Sometimes tests take a certain amount of setting up. Sometimes they may even take a certain amount of time to set up (in the real world). Often you can test two actions in one go.
Pro: only have all that setup occur once. Your tests after the first action will prove that the world is how you expect it to be before the second action. Less code, faster test run.
Con: if either action fails, you'll get the same result: the same test will fail. You'll have less information about where the problem is than if you only had a single action in each of two tests.
In reality, I find that the "con" here isn't much of a problem. The stack trace often narrows things down very quickly, and I'm going to make sure I fix the code anyway.
A slightly different "con" here is that it breaks the "write a new test, make it pass, refactor" cycle. I view that as an ideal cycle, but one which doesn't always mirror reality. Sometimes it's simply more pragmatic to add an extra action and check (or possibly just another check to an existing action) in a current test than to create a new one.

Tests that check for more than one thing aren't usually recommended because they are more tightly coupled and brittle. If you change something in the code, it'll take longer to change the test, since there are more things to account for.
[Edit:]
Ok, say this is a sample test method:
[TestMethod]
public void TestSomething() {
// Test condition A
// Test condition B
// Test condition C
// Test condition D
}
If your test for condition A fails, then B, C, and D will appear to fail as well, and won't provide you with any usefulness. What if your code change would have caused C to fail as well? If you had split them out into 4 separate tests, you would know this.

Haaa... unit tests.
Push any "directives" too far and it rapidly becomes unusable.
Single unit test test a single thing is just as good practice as single method does a single task. But IMHO that does not mean a single test can only contain a single assert statement.
Is
#Test
public void checkNullInputFirstArgument(){...}
#Test
public void checkNullInputSecondArgument(){...}
#Test
public void checkOverInputFirstArgument(){...}
...
better than
#Test
public void testLimitConditions(){...}
is question of taste in my opinion rather than good practice. I personally much prefer the latter.
But
#Test
public void doesWork(){...}
is actually what the "directive" wants you to avoid at all cost and what drains my sanity the fastest.
As a final conclusion, group together things that are semantically related and easilly testable together so that a failed test message, by itself, is actually meaningful enough for you to go directly to the code.
Rule of thumb here on a failed test report: if you have to read the test's code first then your test are not structured well enough and need more splitting into smaller tests.
My 2 cents.

Think of building a car. If you were to apply your theory, of just testing big things, then why not make a test to drive the car through a desert. It breaks down. Ok, so tell me what caused the problem. You can't. That's a scenario test.
A functional test may be to turn on the engine. It fails. But that could be because of a number of reasons. You still couldn't tell me exactly what caused the problem. We're getting closer though.
A unit test is more specific, and will firstly identify where the code is broken, but it will also (if doing proper TDD) help architect your code into clear, modular chunks.
Someone mentioned about using the stack trace. Forget it. That's a second resort. Going through the stack trace, or using debug is a pain and can be time consuming. Especially on larger systems, and complex bugs.
Good characteristics of a unit test:
Fast (milliseconds)
Independent. It's not affected by or dependent on other tests
Clear. It shouldn't be bloated, or contain a huge amount of setup.

Using test-driven development, you would write your tests first, then write the code to pass the test. If your tests are focused, this makes writing the code to pass the test easier.
For example, I might have a method that takes a parameter. One of the things I might think of first is, what should happen if the parameter is null? It should throw a ArgumentNull exception (I think). So I write a test that checks to see if that exception is thrown when I pass a null argument. Run the test. Okay, it throws NotImplementedException. I go and fix that by changing the code to throw an ArgumentNull exception. Run my test it passes. Then I think, what happens if it's too small or too big? Ah, that's two tests. I write the too small case first.
The point is I don't think of the behavior of the method all at once. I build it incrementally (and logically) by thinking about what it should do, then implement code and refactoring as I go to make it look pretty (elegant). This is why tests should be small and focused because when you are thinking about the behavior you should develop in small, understandable increments.

Having tests that verify only one thing makes troubleshooting easier. It's not to say you shouldn't also have tests that do test multiple things, or multiple tests that share the same setup/teardown.
Here should be an illustrative example. Let's say that you have a stack class with queries:
getSize
isEmpty
getTop
and methods to mutate the stack
push(anObject)
pop()
Now, consider the following test case for it (I'm using Python like pseudo-code for this example.)
class TestCase():
def setup():
self.stack = new Stack()
def test():
stack.push(1)
stack.push(2)
stack.pop()
assert stack.top() == 1, "top() isn't showing correct object"
assert stack.getSize() == 1, "getSize() call failed"
From this test case, you can determine if something is wrong, but not whether it is isolated to the push() or pop() implementations, or the queries that return values: top() and getSize().
If we add individual test cases for each method and its behavior, things become much easier to diagnose. Also, by doing fresh setup for each test case, we can guarantee that the problem is completely within the methods that the failing test method called.
def test_size():
assert stack.getSize() == 0
assert stack.isEmpty()
def test_push():
self.stack.push(1)
assert stack.top() == 1, "top returns wrong object after push"
assert stack.getSize() == 1, "getSize wrong after push"
def test_pop():
stack.push(1)
stack.pop()
assert stack.getSize() == 0, "getSize wrong after push"
As far as test-driven development is concerned. I personally write larger "functional tests" that end up testing multiple methods at first, and then create unit tests as I start to implement individual pieces.
Another way to look at it is unit tests verify the contract of each individual method, while larger tests verify the contract that the objects and the system as a whole must follow.
I'm still using three method calls in test_push, however both top() and getSize() are queries that are tested by separate test methods.
You could get similar functionality by adding more asserts to the single test, but then later assertion failures would be hidden.

If you are testing more than one thing then it is called an Integration test...not a unit test. You would still run these integration tests in the same testing framework as your unit tests.
Integration tests are generally slower, unit tests are fast because all dependencies are mocked/faked, so no database/web service/slow service calls.
We run our unit tests on commit to source control, and our integration tests only get run in the nightly build.

If you test more than one thing and the first thing you test fails, you will not know if the subsequent things you are testing pass or fail. It is easier to fix when you know everything that will fail.

Smaller unit test make it more clear where the issue is when they fail.

The GLib, but hopefully still useful, answer is that unit = one. If you test more than one thing, then you aren't unit testing.

Regarding your example: If you are testing add and remove in the same unit test, how do you verify that the item was ever added to your list? That is why you need to add and verify that it was added in one test.
Or to use the lamp example: If you want to test your lamp and all you do is turn the switch on and then off, how do you know the lamp ever turned on? You must take the step in between to look at the lamp and verify that it is on. Then you can turn it off and verify that it turned off.

I support the idea that unit tests should only test one thing. I also stray from it quite a bit. Today I had a test where expensive setup seemed to be forcing me to make more than one assertion per test.
namespace Tests.Integration
{
[TestFixture]
public class FeeMessageTest
{
[Test]
public void ShouldHaveCorrectValues
{
var fees = CallSlowRunningFeeService();
Assert.AreEqual(6.50m, fees.ConvenienceFee);
Assert.AreEqual(2.95m, fees.CreditCardFee);
Assert.AreEqual(59.95m, fees.ChangeFee);
}
}
}
At the same time, I really wanted to see all my assertions that failed, not just the first one. I was expecting them all to fail, and I needed to know what amounts I was really getting back. But, a standard [SetUp] with each test divided would cause 3 calls to the slow service. Suddenly I remembered an article suggesting that using "unconventional" test constructs is where half the benefit of unit testing is hidden. (I think it was a Jeremy Miller post, but can't find it now.) Suddenly [TestFixtureSetUp] popped to mind, and I realized I could make a single service call but still have separate, expressive test methods.
namespace Tests.Integration
{
[TestFixture]
public class FeeMessageTest
{
Fees fees;
[TestFixtureSetUp]
public void FetchFeesMessageFromService()
{
fees = CallSlowRunningFeeService();
}
[Test]
public void ShouldHaveCorrectConvenienceFee()
{
Assert.AreEqual(6.50m, fees.ConvenienceFee);
}
[Test]
public void ShouldHaveCorrectCreditCardFee()
{
Assert.AreEqual(2.95m, fees.CreditCardFee);
}
[Test]
public void ShouldHaveCorrectChangeFee()
{
Assert.AreEqual(59.95m, fees.ChangeFee);
}
}
}
There is more code in this test, but it provides much more value by showing me all the values that don't match expectations at once.
A colleague also pointed out that this is a bit like Scott Bellware's specunit.net: http://code.google.com/p/specunit-net/

Another practical disadvantage of very granular unit testing is that it breaks the DRY principle. I have worked on projects where the rule was that each public method of a class had to have a unit test (a [TestMethod]). Obviously this added some overhead every time you created a public method but the real problem was that it added some "friction" to refactoring.
It's similar to method level documentation, it's nice to have but it's another thing that has to be maintained and it makes changing a method signature or name a little more cumbersome and slows down "floss refactoring" (as described in "Refactoring Tools: Fitness for Purpose" by Emerson Murphy-Hill and Andrew P. Black. PDF, 1.3 MB).
Like most things in design, there is a trade-off that the phrase "a test should test only one thing" doesn't capture.

When a test fails, there are three options:
The implementation is broken and should be fixed.
The test is broken and should be fixed.
The test is not anymore needed and should be removed.
Fine-grained tests with descriptive names help the reader to know why the test was written, which in turn makes it easier to know which of the above options to choose. The name of the test should describe the behaviour which is being specified by the test - and only one behaviour per test - so that just by reading the names of the tests the reader will know what the system does. See this article for more information.
On the other hand, if one test is doing lots of different things and it has a non-descriptive name (such as tests named after methods in the implementation), then it will be very hard to find out the motivation behind the test, and it will be hard to know when and how to change the test.
Here is what a it can look like (with GoSpec), when each test tests only one thing:
func StackSpec(c gospec.Context) {
stack := NewStack()
c.Specify("An empty stack", func() {
c.Specify("is empty", func() {
c.Then(stack).Should.Be(stack.Empty())
})
c.Specify("After a push, the stack is no longer empty", func() {
stack.Push("foo")
c.Then(stack).ShouldNot.Be(stack.Empty())
})
})
c.Specify("When objects have been pushed onto a stack", func() {
stack.Push("one")
stack.Push("two")
c.Specify("the object pushed last is popped first", func() {
x := stack.Pop()
c.Then(x).Should.Equal("two")
})
c.Specify("the object pushed first is popped last", func() {
stack.Pop()
x := stack.Pop()
c.Then(x).Should.Equal("one")
})
c.Specify("After popping all objects, the stack is empty", func() {
stack.Pop()
stack.Pop()
c.Then(stack).Should.Be(stack.Empty())
})
})
}

The real question is why make a test or more for all methods as few tests that cover many methods is simpler.
Well, so that when some test fails you know which method fails.
When you have to repair a non-functioning car, it is easier when you know which part of the engine is failing.
An example: A list class. Why should I make separate tests for addition and removal? A one test that first adds then removes sounds simpler.
Let's suppose that the addition method is broken and does not add, and that the removal method is broken and does not remove. Your test would check that the list, after addition and removal, has the same size as initially. Your test would be in success. Although both of your methods would be broken.

Disclaimer: This is an answer highly influenced by the book "xUnit Test Patterns".
Testing only one thing at each test is one of the most basic principles that provides the following benefits:
Defect Localization: If a test fails, you immediately know why it failed (ideally without further troubleshooting, if you've done a good job with the assertions used).
Test as a specification: the tests are not only there as a safety net, but can easily be used as specification/documentation. For instance, a developer should be able to read the unit tests of a single component and understand the API/contract of it, without needing to read the implementation (leveraging the benefit of encapsulation).
Infeasibility of TDD: TDD is based on having small-sized chunks of functionality and completing progressive iterations of (write failing test, write code, verify test succeeds). This process get highly disrupted if a test has to verify multiple things.
Lack of side-effects: Somewhat related to the first one, but when a test verifies multiple things, it's more possible that it will be tied to other tests as well. So, these tests might need to have a shared test fixture, which means that one will be affected by the other one. So, eventually you might have a test failing, but in reality another test is the one that caused the failure, e.g. by changing the fixture data.
I can only see a single reason why you might benefit from having a test that verifies multiple things, but this should be seen as a code smell actually:
Performance optimisation: There are some cases, where your tests are not running only in memory, but are also dependent in persistent storage (e.g. databases). In some of these cases, having a test verify multiple things might help in decreasing the number of disk accesses, thus decreasing the execution time. However, unit tests should ideally be executable only in memory, so if you stumble upon such a case, you should re-consider whether you are going in the wrong path. All persistent dependencies should be replaced with mock objects in unit tests. End-to-end functionality should be covered by a different suite of integration tests. In this way, you do not need to care about execution time anymore, since integration tests are usually executed by build pipelines and not by developers, so a slightly higher execution time has almost no impact to the efficiency of the software development lifecycle.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js