Partially mocking the internal wirings of SUT to DRY up unit-tests - unit-testing

I'm sorry, it's a very long post. I've read almost everything on the subject, and I'm not yet convinced that it's a bad idea to partially mock the SUT to just DRY up the tests. So, I need to first address all the reasonings against it in order to avoid repetitive answers. Please bear with me.
Have you ever felt the urge to partially mock out the SUT itself, in order to make tests more DRY? Less mocks, less madness, more readable tests?!
Let's have an example at hand to discuss the subject more clearly:
class Sut
{
public function fn0(...)
{
// Interacts with: Dp0
}
public function fn1(...)
{
// Calls: fn0
// Interacts with: Dp1
}
public function fn2(...)
{
// Calls: fn0
// Interacts with: Dp1
}
public function fn3(...)
{
// Calls: fn2
}
public function fn4(...)
{
// Calls: fn1(), fn2(), fn3()
// Interacts with: Dp2
}
}
Now, let's test the behavior of the SUT. Each fn*() is representing a behavior of the class under test. Here, I'm not trying to unit-test each single method of the SUT, but its exposed behaviors.
class SutTest extends \PHPUnit_Framework_Testcase
{
/**
* #covers Sut::fn0
*/
public function testFn0()
{
// Mock Dp0
}
/**
* #covers Sut::fn1
*/
public function testFn1()
{
// Mock Dp1, which is a direct dependency
// Mock Dp0, which is an indirect dependency
}
/**
* #covers Sut::fn2
*/
public function testFn2()
{
// Mock Dp1 with different expectations than testFn1()
// Mock Dp0 with different expectations
}
/**
* #covers Sut::fn3
*/
public function testFn3()
{
// Mock Dp1, again with different expectations
// Mock Dp0, with different expectations
}
/**
* #covers Sut::fn4
*/
public function testFn4()
{
// Mock Dp2 which is a direct dependency
// Mock Dp0, Dp1 as indirect dependencies
}
}
You get the terrible idea! You need to keep repeating yourself. It's not DRY at all. And as the expectations of each mock object may differ for each test, you can't just mock out all the dependencies and set expectations once for the whole testcase. You need to be explicit about the behavior of the mock for each test.
Let's also have some real code to see what it would look like when a test needs to mock out all dependencies through the code path it's testing:
/** #test */
public function dispatchesActionsOnACollectionOfElementsFoundByALocator()
{
$elementMock = $this->mock(RemoteWebElement::class)
->shouldReceive('sendKeys')
->times(3)
->with($text = 'some text...')
->andReturn(Mockery::self())
->mock();
$this->inject(RemoteWebDriver::class)
->shouldReceive('findElements')
->with(WebDriverBy::class)
->andReturn([$elementMock, $elementMock, $elementMock])
->shouldReceive('getCurrentURL')
->zeroOrMoreTimes();
$this->inject(WebDriverBy::class)
->shouldReceive('xpath')
->once()
->with($locator = 'someLocatorToMatchMultipleElements')
->andReturn(Mockery::self());
$this->inject(Locator::class)
->shouldReceive('isLocator', 'isXpath')
->andReturn(true, false);
$this->type($text, $locator);
}
Madness! In order to test a small method, you find yourself writing such a test that is extremely non-readable and coupled to the implementation details of 3 or 4 other dependant methods in its way up the chain. It's even more terrible when you see the whole testcase; lot's of those mock blocks, duplicated to set different expectations, covering different paths of code. The test is mirroring the implementation details of few others. It's painful.
Workaround
OK, back to the first pseudo-code; While testing fn3(), you start thinking, what if I could mock the call to fn2() and put a stop to all the mocking madness? I make a partial mock of the SUT, set expectations for the fn2() and I make sure that the method under test interacts with fn2(), correctly.
In other words, to avoid excessive mocking of external dependencies, I focus on just one behavior of the SUT (might be one or couple of methods) and make sure it behaves correctly. I mockout all other methods that belong to other behaviors of the SUT. No worries about them, they all have their own tests.
Contrary reasonings
One might discuss that:
The problem with stubbing/mocking methods in a class is that you are violating the encapsulation. Your test should be checking to see whether or not the external behaviour of the object matches the specifications. Whatever happens inside the object is none of its business. By mocking the public methods in order to test the object, you are making an assumption about how that object is implemented.
When unit-testing, it's rare that you always deal with such behaviors that are fully testable by providing input and expecting output; seeing them as black boxes. Most of the time you need to test how they interact with one another. So, we need to have, at least, some information about the internal implementation of the SUT, to be able to fully test it.
When we mock a dependency, we're ALREADY making an assumption about how the SUT works. We're binding the test to the implementation details of the SUT. So, now that we're deep in mud, why not mocking an internal method to make our lives easier?!
Some may say:
Mocking the methods is taking the object (SUT) and breaking it into two pieces. One piece is being mocked while the other piece is being tested. What you are doing is essentially an ad-hoc breaking up of the object. If that's the case, just break up the object already.
Unit tests should treat the classes they test as black boxes. The only thing which matters is that its public methods behave the way it is expected. How the class achieves this through internal state and private methods does not matter. When you feel that it is impossible to create meaningful tests this way, it is a sign that your classes are too powerful and do too much. You should consider to move some of their functionality to separate classes which can be tested separately.
If there are arguments that support this separation (partially mock the SUT), the same arguments can be used to refactor the class into two classes and this is exactly what you should to then.
If it's a smell for SRP, yeah right, the functionality can be extracted into another class and then you can easily mock that class and go happily home. But it's not the case. The SUT's design is ok, it has no SRP issues, it's small, it does one job and ad-heres to SOLID principles. When looking at SUT's code, there's no reason you want to break the functionality into some other classes. It's already broken into very fine pieces.
How come when you look at SUT tests, you decide to break the class? How come it's ok to mock out aaaaall those dependencies along the way, when testing fn3() but it's not ok to mock the only real dependency it has (even though it's an internal one)? fn2(). Either way, we're bound to the implementation details of the SUT. Either way, the tests are fragile.
It's important to notice why we want to mock those methods. We just want easier testing, less mocks, while maintaining the absolute isolation of the SUT (more on this later).
Some other might reason:
The way I see it, an object has external and internal behaviors. External behavior includes returns values, calls into other objects, etc. Obviously, anything in that category should be tested. But internal behavior shouldn't really be tested. I don't write tests directly on the internal behavior, only indirectly through the external behavior.
Right, I do that, too. But we're not testing the internals of SUT, we're just exercising its public API and we want to avoid excessive mocking.
Reasoning states that, External behavior includes calls into other objects; I agree. We're tryinng to test the SUT's external calls here as well, just by early-mocking the internal method that makes the interaction. That mocked method(s) has the its tests already.
One other reasons:
Too many mocks and already perfectly broken into multiple classes? You are over-unit-testing by unit-testing what should be integration tested.
The example code has just 3 external dependencies and I don't think that it's too much. Again, it's important to notice why I want to partially mock the SUT; only and only for easier testing, avoiding excessive mocks.
By the way, the reasoning might be true somehow. I might need to do integration testing in some cases. More on this in the next section.
The last one says:
These are all tests man, not production code, they don't need to be DRY!
I've actually read something like this! And I simply don't think so. I need to put my lifetime into use! You, too!
Bottom-line: To mock, or not to mock?
When we choose to mock, we're writing whitebox unit-tests. We're bounding tests to the implementation details of the SUT, more or less. Then, if we decide to go down the PURE way, and radically maintain the isolation of the SUT, soon or later, we find ourselves in the hell of mocking madness... and fragile tests. Ten months into the maintenance, you found yourself serving unit-tests instead of them serving you! You find yourself re-implementing multiple tests for a single change in the implementation of one SUT method. Painful right?
So, if we're going this way, why not partially mock the SUT? Why not making our lives waaaaaaaaay easier? I see no reason not to do so? Do you?
I've read and read, and finally came across this article by Uncle Bob:
https://8thlight.com/blog/uncle-bob/2014/05/10/WhenToMock.html
To qoute the most important part:
Mock across architecturally significant boundaries, but not within those boundaries.
I think it's the remedy to all the mocking madness I told you about. There's no need to radically maintain the isolation of the SUT as I've learnt blindly. Even though it may work for most of the time, it also may force you to live in your private mocking hell, banging your head against the wall.
This little gem piece of advice, is the only reasoning that makes sense to not to partially mock the SUT. In fact, it's the exact opposite of doing so. But now the question would be, isn't that integration testing? Is that still called unit-testing? What's the UNIT here? Architecturally significant boundaries?
Here's another article by Google Testing team, implicitly suggesting the same practice:
https://testing.googleblog.com/2013/05/testing-on-toilet-dont-overuse-mocks.html
To recap
If we're going down the pure isolation way, assuming that the SUT is already broken into fine pieces, with minimum possible external deps, is there any reason not to partially mock the SUT? In order to avoid the excessive mocking and to make unit-tests more DRY?
If we take the advice of Uncle Bob to the heart and only "Mock across architecturally significant boundaries, but not within those boundaries.", is that still considered unit testing? What's the unit here?
Thank you for reading.
P.S. Those contrary reasonings are more or less from existing SO answers or articles I found on the subject. Unfortunately, I don't currently have the refs to link to.

Unit tests don't have to be isolated unit tests, at least if you accept the definition promoted by authors like Martin Fowler and Kent Beck. Kent is the creator of JUnit, and probably the main proponent of TDD. These guys don't do mocking.
In my own experience (as long-time developer of an advanced Java mocking library), I see programmers misusing and abusing mocking APIs all the time. In particular, when they think that partially mocking the SUT is a valid idea. It is not. Making test code more DRY shouldn't be an excuse for over-mocking.
Personally, I favor integration tests with minimal or (preferably) no mocking. As long as your tests are stable and run fast enough, they are ok. The important thing is that tests don't become a pain to write, maintain, and more importantly don't discourage the programmers from running them. (This is why I avoid functional UI-driven tests - they tend to be a pain to run.)

Related

Unittest method calling other Unittest methods. Bad Design?

I am writing unitests for a legacy application without any automated tests. While writing I realized that without refactoring the unittests get pretty large. So without Refactoring i need many Function to keep my tests readable.
Is it okay to use "Arrange" or "Setup" Methods in your Tests to keep them readable? Or is my test to complicated. Here is some Pseudocode.
[TestMethod]
public TestFoo()
{
Obj1 obj1;
Obj2 obj2;
...
//arrange
SetupObjects(obj1, obj2);
//act
Foo.foo( ojb1, obj2);
//assert
AssertObjectStates( obj1, obj2);
}
SetupObjects(Obj1 obj1, Obj2 obj2)
{
CreateAndDoSomethingWithObj1(obj1);
MockSomethingInObj2(obj2);
}
...
It is very common to have helper functions as part of test code. Some situations in which it can be beneficial are:
Test code tends to be repetitive, so helper functions can help to reduce code duplication. This simplifies the creation and maintenance of test suites with larger number of tests. (Note that there are also other mechanisms to reduce code duplication: Typically the frameworks offer some kind of setup and teardown interface which are implicitly called before and after each test, respectively. However, these often lead to a phenomenon that Meszaros calls "mystery guest": Something magic seems to happen outside of the test code. Instead of these implicitly called setup and teardown methods, explicitly called and clearly named helper functions are a good mechanism to get the best of both worlds, namely to avoid duplication in test setups and still make it easy to fully understand each test.)
Encapsulate test code that uses implementation details or non-public interfaces: Unit-testing is the test level closest to the implementation, and if your intent is also to find implementation specific bugs, some tests will be implementation specific. There are also other situations where tests are built upon glass box (aka white box) knowledge, like, when you are looking at code coverage (e.g. MC/DC coverage). This, however, means when designing and implementing your tests, you have to find ways to keep test code maintenance effort low. By encapsulating parts of the test code where implementation details are used, you can end up with tests that also find implementation leven bugs, but still keep maintenance effort on an acceptable level.
Now, looking at your code. It is certainly only example code, therefore the following may be clear for you, even it is not apparent from the code: The way helper functions are used there is not fortunate. The function SetupObjects is not descriptive. And, it sets up both objects individually although for a reader of the test code it appears as if SetupObjects would do something common to both objects.

How should I unit test functions with many subfunctions?

I'm looking to better understand I should test functions that have many substeps or subfunctions.
Let's say I have the functions
// Modify the state of class somehow
public void DoSomething(){
DoSomethingA();
DoSomethingB();
DoSomethingC();
}
Every function here is public. Each subfunction has 2 paths. So to test every path for DoSomething() I'd have 2*2*2 = 8 tests. By writing 8 tests for DoSomething() I will have indirectly tested the subfunctions too.
So should I be testing like this, or instead write unit tests for each of the subfunctions and then only write 1 test case that measures the final state of the class after DoSomething() and ignore all the possible paths? A total of 2+2+2+1 = 7 tests. But is it bad then that the DoSomething() test case will depend on the other unit test cases to have complete coverage?
There appears to be a very prevalent religious belief that testing should be unit testing. While I do not intend to underestimate the usefulness of unit testing, I would like to point out that it is just one possible flavor of testing, and its extensive (or even exclusive) use is indicative of people (or environments) that are somewhat insecure about what they are doing.
In my experience knowledge of the inner workings of a system is useful as a hint for testing, but not as an instrument for testing. Therefore, black box testing is far more useful in most cases, though that's admittedly in part because I do not happen to be insecure about what I am doing. (And that is in turn because I use assertions extensively, so essentially all of my code is constantly testing itself.)
Without knowing the specifics of your case, I would say that in general, the fact that DoSomething() works by invoking DoSomethingA() and then DoSomethingB() and then DoSomethingC() is an implementation detail that your black-box test should best be unaware of. So, I would definitely not test that DoSomething() invokes DoSomethingA(), DoSomethingB(), and DoSomethingC(), I would only test to make sure that it returns the right results, and using the knowledge that it does in fact invoke those three functions as a hint I would implement precisely those 7 tests that you were planning to use.
On the other hand, it should be noted that if DoSomethingA() and DoSomethingB() and DoSomethingC() are also public functions, then you should also test them individually, too.
Definitely test every subfunction seperately (because they're public).
It would help you find the problem if one pops up.
If DoSomething only uses other functions, I wouldn't bother writing additional tests for it. If it has some other logic, I would test it, but assume all functions inside work properly (if they're in a different class, mock them).
The point is finding what the function does that is not covered in other tests and testing that.
Indirect testing should be avoided. You should write unit tests for each function explicitly. After that You should mock submethods and test your main function. For example :
You have a method which inserts a user to DB and method is like this :
void InsertUser(User user){
var exists = SomeExternal.UserExists(user);
if(exists)
throw new Exception("bla bla bla");
//Insert codes here
}
If you want to test InsertUser function, you should mock external/sub/nested methods and test behaviour of InsertUser function.
This example creates two tests: 1 - "When user exists then Should throw Exception" 2 - "When user does not exist then Should insert user"

Do mocks break the "test the interface, not the implementation" mantra?

From Wikipedia (emphasis mine, internal references removed):
In the book "The Art of Unit Testing" mocks are described as a fake object that helps decide whether a test failed or passed by verifying whether an interaction with an object occurred.
It seems to me that mocks are testing implementation. Specifically, they test that the way that the implementation interacted with a particular object.
Am I interpreting this correctly? Are mocks an intentional breaking of the "test the interface, not the implementation" mantra? Or, are mocks testing at a level other than unit tests?
Correct, mocks do not follow the classicist mantra of "test the interface, not the implementation". Instead of state verification, mocks use behavior verification.
From http://martinfowler.com/articles/mocksArentStubs.html:
Mocks use behavior verification.
Mockist tests are thus more coupled to the implementation of a method. Changing the nature of calls to collaborators usually cause a mockist test to break.
This coupling leads to a couple of concerns. The most important one is the effect on Test Driven Development. With mockist testing, writing the test makes you think about the implementation of the behavior - indeed mockist testers see this as an advantage. Classicists, however, think that it's important to only think about what happens from the external interface and to leave all consideration of implementation until after you're done writing the test.
It seems to me that mocks are testing implementation.
Specifically, they test that the way that the implementation interacted with a particular object.
100% correct. However, this is still Unit testing, just from a different perspective.
Let's say you have a method that's supposed to perform a function on 2 numbers using some kind of MathsService. MathsService is given to a Calculator class in order to do the math for the calculator.
Let's pretend MathsService has a single method, PerformFunction(int x, int y) that is supposed to just return x+y.
Testing like this: (all below = pseudo code, with certain bits left out for clarity)
var svc = new MathsService();
var sut = new Calculator(svc);
int expected = 3;
int actual = sut.Calculate(1,2);
Assert.AreEqual(expected,actual,"Doh!");
That's a black box test of the unit Calculator.Calculate(). Your test doesn't know or care how the answer was arrived at. It's important because it gives you a certain level of confidence that your test works correctly.
However, consider this implementation of Calculator.Calculate:
public int Calculate()
{
return 4;
}
Testing like this:
var svc = new Mock<IMathsService>(); //did I mention there was an interface? There's an interface...
svc.Setup(s=>PerformCalculation(1,2)).Returns(3);
var sut new Calculator(svc.Object);
sut.Calculate(1,2);
svc.Verify(s=>PerformCalculation(1,2),Times.Once,"Calculate() didn't invoke PerformFunction");
This white box test doesn't tell you anything about the correctness of the PerformFunction method, but it does prove that, regardless of the result, the Calculator did pass x and y to the IAdditionService.PerformCalculation method, which is what you want it to do.
You can of course write other tests to verify that the result of the PerformCalculation test is passed back without modification to the caller, etc.
Armed with this knowledge, if your first unit test now fails, you can, with a high degree of confidence, jump right into the MathService class to look for problems because you know that the issue is likely not the Calculator.Calculate method.
Hope that helps...

Understanding stubs, fakes and mocks.

I have just started to read Professional Test Driven Development with C#: Developing Real World Applications with TDD
I have a hard time understanding stubs, fakes and mocks. From what I understand so far, they are fake objects used for the purpose of unit testing your projects, and that a mock is a stub with conditional logic into it.
Another thing I think I have picked up is that mocks are somehow related with dependency injection, a concept which I only managed to understand yesterday.
What I do not get is why I would actually use them. I cannot seem to find any concrete examples online that explains them properly.
Can anyone please explain to me this concepts?
As I've read in the past, here's what I believe each term stands for
Stub
Here you are stubbing the result of a method to a known value, just to let the code run without issues. For example, let's say you had the following:
public int CalculateDiskSize(string networkShareName)
{
// This method does things on a network drive.
}
You don't care what the return value of this method is, it's not relevant. Plus it could cause an exception when executed if the network drive is not available. So you stub the result in order to avoid potential execution issues with the method.
So you end up doing something like:
sut.WhenCalled(() => sut.CalculateDiskSize()).Returns(10);
Fake
With a fake you are returning fake data, or creating a fake instance of an object. A classic example are repository classes. Take this method:
public int CalculateTotalSalary(IList<Employee> employees) { }
Normally the above method would be passed a collection of employees that were read from a database. However in your unit tests you don't want to access a database. So you create a fake employees list:
IList<Employee> fakeEmployees = new List<Employee>();
You can then add items to fakeEmployees and assert the expected results, in this case the total salary.
Mocks
When using mock objects you intend to verify some behaviour, or data, on those mock objects. Example:
You want to verify that a specific method was executed during a test run, here's a generic example using Moq mocking framework:
public void Test()
{
// Arrange.
var mock = new Mock<ISomething>();
mock.Expect(m => m.MethodToCheckIfCalled()).Verifiable();
var sut = new ThingToTest();
// Act.
sut.DoSomething(mock.Object);
// Assert
mock.Verify(m => m.MethodToCheckIfCalled());
}
Hopefully the above helps clarify things a bit.
EDIT:
Roy Osherove is a well-known advocate of Test Driven Development, and he has some excellent information on the topic. You may find it very useful :
http://artofunittesting.com/
They are all variations of the Test Double. Here is a very good reference that explains the differences between them: http://xunitpatterns.com/Test%20Double.html
Also, from Martin Fowler's post: http://martinfowler.com/articles/mocksArentStubs.html
Meszaros uses the term Test Double as the generic term for any kind of
pretend object used in place of a real object for testing purposes.
The name comes from the notion of a Stunt Double in movies. (One of
his aims was to avoid using any name that was already widely used.)
Meszaros then defined four particular kinds of double:
Dummy objects: are passed around but never actually used. Usually they
are just used to fill parameter lists.
Fake objects actually have working implementations, but usually take some shortcut which makes
them not suitable for production (an in memory database is a good
example).
Stubs provide canned answers to calls made during the test,
usually not responding at all to anything outside what's programmed in
for the test. Stubs may also record information about calls, such as
an email gateway stub that remembers the messages it 'sent', or maybe
only how many messages it 'sent'.
Mocks are what we are talking about here: objects pre-programmed with expectations which form a
specification of the calls they are expected to receive.
Of these kinds of doubles, only mocks insist upon behavior verification. The
other doubles can, and usually do, use state verification. Mocks
actually do behave like other doubles during the exercise phase, as
they need to make the SUT believe it's talking with its real
collaborators.
This PHP Unit's manual helped me a lot as introduction:
"Sometimes it is just plain hard to test the system under test (SUT) because it depends on other components that cannot be used in the test environment. This could be because they aren't available, they will not return the results needed for the test or because executing them would have undesirable side effects. In other cases, our test strategy requires us to have more control or visibility of the internal behavior of the SUT." More: https://phpunit.de/manual/current/en/test-doubles.html
And i find better "introductions" when looking for "test doubles" as mocks, fakes, stubs and the others are known.

When should I mock?

I have a basic understanding of mock and fake objects, but I'm not sure I have a feeling about when/where to use mocking - especially as it would apply to this scenario here.
Mock objects are useful when you want to test interactions between a class under test and a particular interface.
For example, we want to test that method sendInvitations(MailServer mailServer) calls MailServer.createMessage() exactly once, and also calls MailServer.sendMessage(m) exactly once, and no other methods are called on the MailServer interface. This is when we can use mock objects.
With mock objects, instead of passing a real MailServerImpl, or a test TestMailServer, we can pass a mock implementation of the MailServer interface. Before we pass a mock MailServer, we "train" it, so that it knows what method calls to expect and what return values to return. At the end, the mock object asserts, that all expected methods were called as expected.
This sounds good in theory, but there are also some downsides.
Mock shortcomings
If you have a mock framework in place, you are tempted to use mock object every time you need to pass an interface to the class under the test. This way you end up testing interactions even when it is not necessary. Unfortunately, unwanted (accidental) testing of interactions is bad, because then you're testing that a particular requirement is implemented in a particular way, instead of that the implementation produced the required result.
Here's an example in pseudocode. Let's suppose we've created a MySorter class and we want to test it:
// the correct way of testing
testSort() {
testList = [1, 7, 3, 8, 2]
MySorter.sort(testList)
assert testList equals [1, 2, 3, 7, 8]
}
// incorrect, testing implementation
testSort() {
testList = [1, 7, 3, 8, 2]
MySorter.sort(testList)
assert that compare(1, 2) was called once
assert that compare(1, 3) was not called
assert that compare(2, 3) was called once
....
}
(In this example we assume that it's not a particular sorting algorithm, such as quick sort, that we want to test; in that case, the latter test would actually be valid.)
In such an extreme example it's obvious why the latter example is wrong. When we change the implementation of MySorter, the first test does a great job of making sure we still sort correctly, which is the whole point of tests - they allow us to change the code safely. On the other hand, the latter test always breaks and it is actively harmful; it hinders refactoring.
Mocks as stubs
Mock frameworks often allow also less strict usage, where we don't have to specify exactly how many times methods should be called and what parameters are expected; they allow creating mock objects that are used as stubs.
Let's suppose we have a method sendInvitations(PdfFormatter pdfFormatter, MailServer mailServer) that we want to test. The PdfFormatter object can be used to create the invitation. Here's the test:
testInvitations() {
// train as stub
pdfFormatter = create mock of PdfFormatter
let pdfFormatter.getCanvasWidth() returns 100
let pdfFormatter.getCanvasHeight() returns 300
let pdfFormatter.addText(x, y, text) returns true
let pdfFormatter.drawLine(line) does nothing
// train as mock
mailServer = create mock of MailServer
expect mailServer.sendMail() called exactly once
// do the test
sendInvitations(pdfFormatter, mailServer)
assert that all pdfFormatter expectations are met
assert that all mailServer expectations are met
}
In this example, we don't really care about the PdfFormatter object so we just train it to quietly accept any call and return some sensible canned return values for all methods that sendInvitation() happens to call at this point. How did we come up with exactly this list of methods to train? We simply ran the test and kept adding the methods until the test passed. Notice, that we trained the stub to respond to a method without having a clue why it needs to call it, we simply added everything that the test complained about. We are happy, the test passes.
But what happens later, when we change sendInvitations(), or some other class that sendInvitations() uses, to create more fancy pdfs? Our test suddenly fails because now more methods of PdfFormatter are called and we didn't train our stub to expect them. And usually it's not only one test that fails in situations like this, it's any test that happens to use, directly or indirectly, the sendInvitations() method. We have to fix all those tests by adding more trainings. Also notice, that we can't remove methods no longer needed, because we don't know which of them are not needed. Again, it hinders refactoring.
Also, the readability of test suffered terribly, there's lots of code there that we didn't write because of we wanted to, but because we had to; it's not us who want that code there. Tests that use mock objects look very complex and are often difficult to read. The tests should help the reader understand, how the class under the test should be used, thus they should be simple and straightforward. If they are not readable, nobody is going to maintain them; in fact, it's easier to delete them than to maintain them.
How to fix that? Easily:
Try using real classes instead of mocks whenever possible. Use the real PdfFormatterImpl. If it's not possible, change the real classes to make it possible. Not being able to use a class in tests usually points to some problems with the class. Fixing the problems is a win-win situation - you fixed the class and you have a simpler test. On the other hand, not fixing it and using mocks is a no-win situation - you didn't fix the real class and you have more complex, less readable tests that hinder further refactorings.
Try creating a simple test implementation of the interface instead of mocking it in each test, and use this test class in all your tests. Create TestPdfFormatter that does nothing. That way you can change it once for all tests and your tests are not cluttered with lengthy setups where you train your stubs.
All in all, mock objects have their use, but when not used carefully, they often encourage bad practices, testing implementation details, hinder refactoring and produce difficult to read and difficult to maintain tests.
For some more details on shortcomings of mocks see also Mock Objects: Shortcomings and Use Cases.
A unit test should test a single codepath through a single method. When the execution of a method passes outside of that method, into another object, and back again, you have a dependency.
When you test that code path with the actual dependency, you are not unit testing; you are integration testing. While that's good and necessary, it isn't unit testing.
If your dependency is buggy, your test may be affected in such a way to return a false positive. For instance, you may pass the dependency an unexpected null, and the dependency may not throw on null as it is documented to do. Your test does not encounter a null argument exception as it should have, and the test passes.
Also, you may find its hard, if not impossible, to reliably get the dependent object to return exactly what you want during a test. That also includes throwing expected exceptions within tests.
A mock replaces that dependency. You set expectations on calls to the dependent object, set the exact return values it should give you to perform the test you want, and/or what exceptions to throw so that you can test your exception handling code. In this way you can test the unit in question easily.
TL;DR: Mock every dependency your unit test touches.
Rule of thumb:
If the function you are testing needs a complicated object as a parameter, and it would be a pain to simply instantiate this object (if, for example it tries to establish a TCP connection), use a mock.
You should mock an object when you have a dependency in a unit of code you are trying to test that needs to be "just so".
For example, when you are trying to test some logic in your unit of code but you need to get something from another object and what is returned from this dependency might affect what you are trying to test - mock that object.
A great podcast on the topic can be found here