Separate Class vs Method - unit-testing

Quick design question.
ClassA has a method called DoSomething(args)
In DoSomething(), before it can actually do something, it needs to do some preparatory work with args. I believe this should be encapsulated within ClassA, (as opposed to doing the prep work outside and passing it in) as nothing else needs to know that this prep work is required to DoSomething.
However, it's where the actual preparatory work code belongs that is making me think.
The preparatory work in my particular example is to create a list of items, which satisfy a certain condition, from args.
My hunch is that I should create a new class, ListOfStuff, which takes args in its constructor and put this preparatory work here.
From a TDD-perspective, I think this is the right choice. We can then unit test ListOfStuff til our heart is content. If we had put the preparatory work in a private method of ClassA, we'd have only been able to indirectly test it through testing DoSomething().
But is this overkill? Since adopting the TDD and DI approach, I've seen the number of classes that I write multiply - should I be worried?
Ta.

There are a couple of heuristics here.
Is there state on this class that
survives from invocation to
invocation? Does this prep work get
done every time you need to
doSomething(), or is it done and
saved? If so, that argues for the
class.
Does this computation need to happen
in more than once place? If so,
that argues for a class.
Can the details of the
implementation of the doSomething()
method, or the preparation work for
it, change without affecting the
enclosing class? If so, that argues
for a class.
Well, three heuristics. No one expects the Spanish Inquisition.

What is the simplest thing that could possibly work? That's the TDD mantra. Don't try to think too far ahead. If the time comes to create the helper class, you'll know it because you'll be doing all sorts of related work in multiple methods in other classes. Until then, do the work in your method. If it makes the method too long or cumbersome to read, extract the work to its own method. This method can also be tested to your heart's content without the need for another class.

Definately put it in a new class. Its called separation of concerns. You don't want to overload a class and make it do all sorts of other stuff. This is because your class will not be able to be used anywhere else as its so specific to one thing.
Put it in class, then use that class elsewhere. Otherwise you'd have to write this again and again in the future.
To make it extensible, and be able to pass in all sorts of different algorithms, the design pattern your after here is the Strategy pattern. But that is for the future...

Your object model design should be considered on it's own, rather than in the context of your development strategy. If building ListOFStuff and passing to to DoSomething really is how the object model fits together best, do it, regardless of your development strategy.
I think you've answered your own question a little, however, since ListOfStuff makes it easier to unit test, that probably means it's also a cleaner design.
Hope that helps!

Since adopting the TDD and DI
approach, I've seen the number of
classes that I write multiply - should
I be worried?
If your aim is procedural programming, yes. But since you're almost certainly wanting to work in an OO fashion, no.
Most people use too few types, spend too little time thinking about OO design.
Your questions about class responsibility reflect maturation of thinking (imho).

SoC is valid only if the "concern" in question has nothing to do with the class, or you find classes sharing this "concern" (aka, violating DRY). In your case, it seems that the code is intrinsic to the class - so possibly a private function would be more apt.
As tvanfosson said above, you need to balance SoC with YAGNI. Personally - I think you might be mulling over this prematurely (I know! I do it too all the time).

Related

C++ Singleton Design pattern alternatives

I hate to beat a dead horse, that said, I've gone over so many conflicting articles over the past few days in regards to the use of the singleton pattern.
This question isn't be about which is the better choice in general, rather what makes sense for my use case.
The pet project I'm working on is a game. Some of the code that I'm currently working on, I'm leaning towards using a singleton pattern.
The use cases are as follows:
a globally accessible logger.
an OpenGL rendering manager.
file system access.
network access.
etc.
Now for clarification, more than a couple of the above require shared state between accesses. For instance, the logger is wrapping a logging library and requires a pointer to the output log, the network requires an established open connection, etc.
Now from what I can tell it's more suggested that singletons be avoided, so lets look at how we may do that. A lot of the articles simply say to create the instance at the top and pass it down as a parameter to anywhere that is needed. While I agree that this is technically doable, my question then becomes, how does one manage the potentially massive number of parameters? Well what comes to mind is wrapping the different instances in a sort of "context" object and passing that, then doing something like context->log("Hello World"). Now sure that isn't to bad, but what if you have a sort of framework like so:
game_loop(ctx)
->update_entities(ctx)
->on_preupdate(ctx)
->run_something(ctx)
->only use ctx->log() in some freak edge case in this function.
->on_update(ctx)
->whatever(ctx)
->ctx->networksend(stuff)
->update_physics(ctx)
->ctx->networksend(stuff)
//maybe ctx never uses log here.
You get the point... in some areas, some aspects of the "ctx" aren't ever used but you're still stuck passing it literally everywhere in case you may want to debug something down the line using logger, or maybe later in development, you actually want networking or whatever in that section of code.
I feel like the above example would much rather be suited to a globally accessible singleton, but I must admit, I'm coming from a C#/Java/JS background which may color my view. I want to adopt the mindset/best practices of a C++ programmer, yet like I said, I can't seem to find a straight answer. I also noticed that the articles that suggest just passing the "singleton" as a parameter only give very simplistic use cases that anyone would agree a parameter would be the better way to go.
In this game example, you probably wan't to access logging everywhere even if you don't plan on using it immediately. File system stuff may be all over but until you build out the project, it's really hard to say when/where it will be most useful.
So do I:
Stick with using singletons for these use cases regardless of how "evil/bad" people say it is.
Wrap everything in a context object, and pass it literally everywhere. (seems kinda gross IMO, but if that's the "more accepted/better" way of doing it, so be it.)
Something completely else. (Really lost as to what that might be.)
If option 1, from a performance standpoint, should I switch to using namespace functions, and hiding the "private" variables / functions in anonymous namespaces like most people do in C? (I'm guessing there will be a small boost in performance, but then I'll be stuck having to call an "init" and "destroy" method on a few of these rather than being able to just allow the constructor/destructor to do that for me, still might be worth while?)
Now I realize this may be a bit opinion based, but I'm hoping I can still get a relatively good answer when a more complicated/nested code base is in question.
Edit:
After much more deliberation I've decided to use the "Service Locator" pattern instead. To prevent a global/singleton of the Service Locator I'm making anything that may use the services inherit from a abstract base class that requires the Service Locator be passed when constructed.
I haven't implemented everything yet so I'm still unsure if I'll run into any problems with this approach, and would still love feedback on if this is a reasonable alternative to the singleton / global scope dilemma.
I had read that Service Locator is also somewhat of an anti-pattern, that said, many of the example I found implemented it with statics and/or as a singleton, perhaps using it as I've described removes the aspects that cause it to be an anti-pattern?
Whenever you think you want to use a Singleton, ask yourself the following question: Why is it that it must be ensured at all cost that there never exists more than one instance of this class at any point in time? Because the whole point of the Singleton pattern is to make sure that there can never be more than one instance of the Singleton. That's what the term "singleton" is all about: there only being one. That's why it's called the Singleton pattern. That's why the pattern calls for the constructor to be private. The point of the Singleton pattern is not and never was to give you a globally-accessible instance of something. The fact that there is a global access point to the sole instance is just a consequence of the Singleton pattern. It is not the objective the Singleton pattern is meant to achieve. If all you want is a globally accessible instance of something, then use a global variable. That's exactly what global variables are for…
The Singleton pattern is probably the one design pattern that's singularly more often misunderstood than not. Is it an intrinsic aspect of the very concept of a network connection that there can only ever be one network connection at a time, and the world would come to an end if that constraint was ever to be violated? If the answer is no, then there is no justification for a network connection to ever be modeled as a Singleton. But don't take my word for it, convince yourself by checking out page 127 of Design Patterns: Elements of Reusable Object-Oriented Software where the Singleton pattern was originally described…😉
Concerning your example: If you're ending up having to pass a massive number of parameters into some place then that first and foremost tells you one thing: there are too many responsibilities in that place. This fact is not changed by the use of Singletons. The use of Singletons simply obfuscates this fact because you're not forced to pass all stuff in through one door in the form of parameters but rather just access whatever you want directly all over the place. But you're still accessing these things. So the dependencies of your piece of code are the same. These dependencies are just not expressed explicitly anymore at some interface level but creep around in the mists. And you never know upfront what stuff a certain piece of code depends on until the moment your build breaks after trying to take away one thing that something else happened to depend upon. Note that this issue is not specific to the Singleton pattern. This is a concern with any kind of global entity in general…
So rather than ask the question of how to best pass a massive number of parameters, you should ask the question of why the hell does this one piece of code need access to that many things? For example, do you really need to explicitly pass the network connection to the game loop? Should the game loop not maybe just know the physics world object and that physics world object is at the moment of creation given some object that handles the network communication. And that object in turn is upon initialization told the network connection it is supposed to use? The log could just be a global variable (or is there really anything about the very idea of a log itself that prohibits there ever being more than one log?). Or maybe it would actually make sense for each thread to have its own log (could be a thread-local variable) so that you get a log from each thread in the order of the control flow that thread happened to take rather than some (at best) interleaved mess that would be the output from multiple threads for which you'd probably want to write some tool so that you'd at least have some hope of making sense of it at all…
Concerning performance, consider that, in a game, you'll typically have some parent objects that each manage collections of small child objects. Performance-critical stuff would generally be happening in places where something has to be done to all child objects in such a collection. The relative overhead of first getting to the parent object itself should generally be negligible…
PS: You might wanna have a look at the Entity Component System pattern…

What's a good way to unit test the main function that calls all the simple functions?

I have this code with lots of small functions f_small_1(), f_small_2(), ... f_small_10().
These are easy to unit test individually.
But in the real world, you often have a complex function f_put_things_together() that needs to call the smaller ones.
What is a good way to unit test f_put_things_together ?
func f_put_things_together() {
a = f_small_1()
if a {
f_small_2()
} else {
f_small_3()
}
f_small_4()
...
f_small_10()
}
I started to write tests but I have the impression that Im doing this twice as I have already tested the smaller functions.
I could have f_put_things_together take objects a1, a2, ..., a10 as arguments and call a1.f_small_1(), a2.f_small_2(), ... so that I can mock these objects individually but this doesn't feel right to me: if I didn't have to write unit tests, all these functions would logically belong to the same class, and I don't want to have unclear code for the sake of testing.
This is somehow language agnostic and somehow not, as languages like Python enable you to replace methods of an object. So if you have an answer that is language agnostic, that's best. Else Im currently using Go.
The general case that you've shown in your example demonstrates the need to test both the simple functions and the aggregation of the results of those functions. When testing the aggregating function, you really want to fake the results of the smaller functions the aggregating function depends on. So, you're on the right track.
However, if you're having trouble writing unit tests for your code, then you're probably having one of these classes of problems:
You've somehow violated the SOLID principles (description here). In other words, something is deficient in the micro-architecture of your code.
You're trying to fake someone else's interface and you're having trouble matching the actual behavior of their implementation with your fake implementation. (This doesn't seem to be the case here).
The objects that you're testing with require a bunch of data setup that should be simplified, at least within the context of testing (also, doesn't appear to be the case).
If you're tests are painful to write, they're telling you something! With experience, you'll be able to quickly pick up on the pain point in your implementation that the tests are indicating.
Unfortunately, your example is a bit small and abstract. To be more precise, I don't know what f_small_1 ... f_small_10 do. So, with more details, I might make more precise recommendations for doing some small refactoring that could have a big payoff for your testing.
I can say, however, that it appears that f_put_things_together looks a bit big to me. This could be a violation of the Single Responsibility Principle (the 'S' in SOLID). I see 10 function calls at a minimum along with some branching logic.
You'll need to write a separate test for each branch path through your function. The less branching you have in a particular function, the less tests you'll need to write. For more information, take a look at Cyclomatic Complexity. In this case, it seems the method has a low CC, so this likely isn't the problem.
The ten calls to smaller functions do make me wonder a bit. It looks like for simplicity you've left out capturing the return value of these function calls and the logic for aggregating the results. In that case, yes you really do want to fake the results of the smaller functions then write a few tests to check the algorithm you're using to aggregate everything.
Or, perhaps the functions are all void and you need to verify that everything happened, and maybe that it happened in the right order. In that case, you're looking at writing more of an interaction-based test. You'll still want to put those smaller function calls behind an interface / class / object that you fake. In this case, the fake should capture the calls and the call order so that your test can make the assertions that are really important.
If some of the smaller functions are related to each other, it might make sense to group them together in a single method a separate class. Then, your test for f_put_things_together will have fewer dependencies that need to be faked. You will have a new class that also needs tested, but it's much easier to test two smaller methods than to test one large one that has too much responsibility.
Summary
This is actually a very good question with the exception of it being a bit vague. If you can provide a more detailed example, perhaps I could make more detailed recommendations. The bottom line is this: If your tests are difficult to write then either you need some help / coaching on writing tests or something about the design of your implementation is off and your tests are trying to tell you what it is.

Faking a Method of the Object Under Test

Is there a reason why you shouldn't create a partial fake of an object or just fake one method on the object that you are testing of it for the sake of testing another method? This might be helpful to save you from making an entire new mock object, or when there is an external dependency in the method you are faking which you can't reasonably get rid of and would like to keep out of all the other unit tests?
The objects you want to do this for are trying to do too many things. In particular, if you have an external dependency, you would normally create an object to isolate that dependency. The Façade pattern is one example of this. If your objects weren't designed with testability in mind you may have to do some refactoring. Take a look at Michael Feathers' PDF on working with legacy code(PDF). He also has a book by the same title that goes into much more detail.
It is a very bad idea to mock/fake part of a class to test another.
Doing this, you are not testing what the real code does in the conditions under test leading to unreliable test results.
It also increases the maintenance burden of the faked part of the class. If this is in effect for the whole test program, the fake implementation also makes other tests on the faked method harder.
You need to ask yourself why you need to fake out the part under test.
If it is because the method is accessing a file or database, then you should define an interface and pass an instance of that interface to the class constructor or method. This allows you to test different scenarios in the same test application.
If it is because you are using singletons, you should rethink your design to make it more testable: removing singletons will remove implicit dependencies and maintenance nightmares.
If you are using static methods/free-standing functions to access data in a registry or settings file, you should really move that out of the function under test and pass the data as a parameter or provide a settings provider interface. This will make the code more flexible and robust.
If it is to break a dependency for the purpose of testing (e.g. faking out a vector method to test a method in a matrix class) then you should not be faking that -- you should treat the code under test as what is defined by the class under test by its public interface: methods; pre-conditions, post-conditions, invariants, documentation, parameters and exception specifications.
You can use knowledge of the implementation details to test special edge cases, but trigger those through the main API, not by faking an implementation detail.
For example, suppose you faked std::vector::at() but the implementation switched to use operator[] instead. Your test would break or silently pass.
If the method you want to fake is virtual (as in, not static and not final), then you can subclass your object in your test, override the method in the subclass, and exercise the subclass in the test. No mock-object libraries required.
(Ideally you should consider refactoring, this is not a great long-term solution. But it is a way to get legacy code under test so you can start the refactoring process more easily.)
The Extract and Override technique described in Chapter 3 of Roy Osherove's The Art of Unit Testing does seem to be a way to fake part of the class under test (pp. 71-77). Osherove does not address the concerns raised in some of the other answers to this question.
In addition, Michael Feathers discusses this in Working Effectively with Legacy Code. He terms the resulting class a testing subclass (227) and the technique Subclass and Override Method (401). Now, granted, Feathers is not giving an exposition of pristine techniques that are recommended on new code. But he still gives it serious treatment as a potentially helpful technique.
I also asked my former computer professor about this. He is well-read and currently works full-time in the software industry, where he has advanced rapidly. He said that this technique definitely has a good application, and that there are several dozen classes in the codebase at his company that are under test in this way. He said that, like any technique, it can be overused.
I originally wrote the question when I was new to unit testing and knew next to nothing about dependency injection. Now, after some experience with both, I would add that the need to use this testing technique could be a smell. It may be a sign that need to rework your approach to dependencies. If the method that needs to be faked is one that is inherited from a base class, it may mean that you need to take the adage "favor composition over inheritance" more seriously. You should inject your dependencies rather than inheriting them.
There are some really nice packages for facilitating this kind of stuff. For instance, from the Mockito docs:
//You can mock concrete classes, not only interfaces
LinkedList mockedList = mock(LinkedList.class);
//stubbing
when(mockedList.get(0)).thenReturn("first");
does some real magic that's hard to believe at first. When you call
String firstMember = mockedList.get(0);
you'll get back "first", because of what you said in the "when" statement.

Do I only have to mock out external dependencies in a unit test? What's about internal dependencies?

Do I only have to mock out external dependencies in a unit test?
What if my method that I want to test, has a dependency on another class within the same assembly? Do I have to mock out the dependency for going sure to test only one thing and there for to make a unit test instead of an integration test?
Is an integration test a test that tests dependencies in general or do I have to difference between internal and external dependencies?
An example would be a method that has 2000 lines of code with 5 method invocations (all methods coming from the same assembly).
Generally a proper unit test is testing only that single piece of code. So a scenario like this is where you start to ask yourself about the coupling of these two classes. Does Class A internally depend on the implementation of Class B? Or does it just need to be supplied an instance of Type B (notice the difference between a class and a type)?
If the latter, then mock it because you're not testing Class B, just Class A.
If the former, then it sounds like creating the test has identified some coupling that can (perhaps even should) be re-factored.
Edit: (in response to your comment) I guess a key thing to remember while doing this (and retro-fitting unit tests into a legacy system is really, really difficult) is to mentally separate the concepts of a class and a type.
The unit tests are not for Class A, they are for Type A. Class A is an implementation of Type A which will either pass or fail the tests. Class A may have an internal dependency on Type B and need it to be supplied, but Type A might not. Type A is a contract of functionality, which is further expressed by its unit tests.
Does Type A specify in its contract that implementations will require an instance of Type B? Or does Class A resolve an instance of it internally? Does Type A need to specify this, or is it possible that different implementations of Type A won't need an instance of Type B?
If Type A requires an instance of Type B, then it should expose this externally and you'd supply the mock in your tests. If Class A internally resolves an instance of Type B, then you'd likely want to be using an IoC container where you'd bootstrap it with the mock of Type B before running the tests.
Either way, Type B should be a mock and not an implementation. It's just a matter of breaking that coupling, which may or may not be difficult in a legacy system. (And, additionally, may or may not have a good ROI for the business.)
Working with a code base you're describing isn't easy with multiple problems combined into something you don't know how to start changing. There are strong dependencies between classes as well as between problems and maybe even no overall design.
In my experience, this takes a lot of effort and time as well as skill in doing this kind of work. A very good resource to learn how to work with legacy code is Michael Feather's book: Working Effectively with Legacy Code.
In short, there are safe refactorings you can do without risking to break things, which might help you get started. There are also other refactorings which require tests to protect how things work. Tests are essential when refactoring code. This doesn't of course come with a 100% guarantee that things don't break, because there might be so many hidden "features" and complexity you cannot be aware of when you start. Depending on the code base the amount of work you need to do varies greatly, but for large code bases there is usually a lot of work.
You'll need to understand what the code does, either by simply knowing it or by finding out what the current code does. In either case, you start by writing "larger" tests which are not really unit tests, they just protect the current code. They might cover larger parts, more like integration/functional tests. These are your guards when you start to refactor the code. When you have such tests in place and you feel comfortable what the code does, you can start refactoring the parts the "larger" tests cover. For the smaller parts you change you write proper unit tests. Iterating doing various refactorings will at some point make the initial large tests unnecessary because you now have a much better code base and unit tests (or you simply keep them as functional test).
Now, coming back to your question.
I understand what you mean with your question, but I'd still like to change it slightly because there are more important aspects than external and internal. I believe a better question is to ask which dependencies do I need to break to get a better design and to write unit tests?
The answer to this question is you should break all dependencies you are not in control over, slow, non-deterministic or pulls in too much state for a single unit test. These are for sure all external (filesystem, printer, network etc.). Also note that multi-threading is not suitable for unit tests because this is not deterministic. For internal dependencies I assume you mean classes with members or functions calling other functions. The answer to this is maybe. You need to decide if you are in control and if the design is good. Probably in your case you are not in control and the code is not good, so here you need to refactor things to get things under control and into a better design. Michael Feather's book is great here, but you need to find how to apply the things on your code base of couse.
One very good technique for breaking dependencies is dependency injection. In short, it changes the design so that you pass in the members a class uses instead of letting the class itself instantiate them. For these you have an interface (abstract base class) for these dependencies you pass in, so you can easily change what you pass in. For instance, using this you can have different member implementations for a class in production and when you do unit test. This is a great technique and also leads to good design if use wisely.
Good luck and take your time! ;)
Generally speaking, a method with 2000 lines of code is just plain BAD. I usually start to look for reasons to make new classes -- not even methods, but classes -- when i have to use the pagedown key more than three or four times to browse through it (and collapsable regions doesn't count).
So, yes you do need to get rid of dependencies from outside and inside of the assembly, and you need to think of responsibility of the class. It sounds like this one has way too much weight on its shoulders, and it sounds like it is very close to impossible to write unittests for. If you think testability, you will automatically start to inject dependencies, and downsize your classes, and BAM!!!There you have it; nice and pretty code!! :-)
Regards,
Morten

Finding Implicit Communication in Classes

I am currently refactoring a very useful but poorly designed class in C++, and I'm running into a problem with the design: rather passing data around using arguments to methods, the data is passed around by setting private state variables in the class. This makes it very difficult for me to diagram out how data moves through functions. It's my weekend task to try and remove this style of passing data around as much as possible, as makes the program very impossible to understand from just the method signatures, as the signatures only tell a part of the story. I've decided
My current approach to test if a method communicates using private class-level variables is the following:
Edit the method and make it a function rather than a method, which removes its access to the state variables in the class.
Edit all of the calls to the method so that they call the function rather than the method.
Compile, see if anything breaks. Make a list of accessors to add to the original class.
Run the unit tests to see if I've broken anything in a very subtle way.
Is there a better way of doing this, perhaps one that can be easily automated? Is this refactoring a well-known technique that I can cite if I show it to other people?
The only mention of this problem that I've found so far is this quote from Coders at Work via the Object-oriented programming Wikipedia entry:
"The problem with object-oriented languages is they've got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." - Joe Armstrong
Edit in response to a good question from Oli Charlesworth:
I understand that the point of OOP is to sometimes communicate through state variables of the class. The difficulty with my current case is that there are currently 78 different data members in the class, many of which are key-value pairs of strings to other data types, and there are undocumented implicit dependencies on the order in which they need to be initialized. It's possible that given a sufficiently smart programmer working with this class would be easy, but it's currently very difficult for me. I think that several of these data types could be abstracted into their own classes, but before I can do that I need to understand more clearly how the data members interact with each other.
Given the clarification in the question my "are you sure it's not just that you don't like the other programmer's style" comment dies a death ;)
Personally I'd just refactor normally. That is, with 78 data members and lots of bits that are related but not in a class of their own I'd start by grouping the related data and extracting the functionality that works on it. There's no need, IMHO, to go through a stage where you explicitly pass the data into the functions in the existing class. Just pick a group of related data items, come up with a decent name, extract them and work out where they were used and how you need to move functionality into the new class.
Ideally, I'd start writing unit tests for the main class and the new broken out classes as I went along...
Instead of making all of the method's callers call the function, a smaller intermediate change would be to leave the method in place for all callers, and have it simply delegate by calling the function. Later you can inline the method call so all callers are directly calling the function.
Also, from your description it sounds like you are approaching this with manual testing. You will have better success (easier refactoring with reduced risk of error) with comprehensive unit tests in place, although of course the code you describe would be hard to unit test. Nevertheless, work toward more test automation.