Testing approach for algorithms with complex outputs - unit-testing

How to test a result of a program that is basically a black box? For example one year ago I had to write a B tree as a homework and I really struggled with testing the correctness. What strategies do you use in such scenarios? Visualization? Robust input-->result sets of testing data? What do you do when it is hard to get such data because the only way how to get them is your proper working program?
EDIT: I think that my question was misunderstood. There was no problem with understanding how B tree works. That is trivial. But writing robust tests for validating its proper functionality is not so trivial. I think that this school problem is similar to many practical REAL word scenarios and test cases. And sometimes understanding the domain is quite different from delivering working and correct program...
EDIT2: And yes, with B tree it is possible to validate proper behavior with pen and paper. But this is really dirty and not fun :) This is not working well with problems that requires huge amount of data for their validation...

I'm not sure these answers really capture the problem at hand. A B-tree's input and output aren't any different from those of any other dictionary---but the algorithm performs better, if it's implemented correctly. It's only really got two functions to test (add, and find) so theoretically, "black-box" testing of this single component should be fine. Designing for testability isn't the issue, since no matter how you do it the whole algorithm will be one component.
So the question is: when you have to implement subtle algorithms, the kinds with complicated output that you can't always understand in your head so well, how do you test them? I think there are three different strategies you can use:
Black-box test basic functionality. For the B-tree case, this is things like cwash suggested, and also, things like making sure that when you add an item, you can then find it, etc.
Test certain invariants that your algorithm should maintain (the B-tree should be balanced, values within nodes should be sorted, etc.)
A few, small "pencil-and-paper" tests may be necessary -- work the algorithm out by hand and check that it matches what your code does. But the big-data tests can all be of type 2. These can also be brittle, so unless you need to be really sure about your algorithm, you may want to avoid them.

If you do not grasp the problem at hand, how can you develop a solution to it? My suggestion would be to understand the domain enough to be able to work out the problem on paper and ensure that your program matches.

Consult with an expert on the subject.
I know if I have a convoluted procedure I'm trying to fix, I have no idea what the output should be after my changes, so I need to consult a fellow developer with more knowledge of the business need, and they are able to verify what I've done is correct.

I would focus on constructing test cases that exercise the functionality of your B-tree algorithm. I haven't looked at it for years, but I'm fairly sure you'll be able to find a documented sequence of steps to insert a set of values in a specific order, then validate that the leaf nodes are as they should be. If you construct your testing along these lines, you should be able to prove your implementation is correct.

The key is to know there is a balance between testing something to death and doing tests that adequately cover what should be covered. Edge cases, e.g null inputs or checking inputs are numeric by testing an alphabet character or a punctuation character, are likely most of the tests you'd need. To complement this there may be one or two common cases to handle to show the program can handle a non-edge case as well. To cover all valid input in most programs is overkill and would result in an overwhelmingly large amount of tests.

I think the answer to the question you're asking boils down to designing for testability. Often you get a testable design for free when you test-drive the development of the solution. But let's face it, when you're implementing a highly mathematical algorithm, this just doesn't fall out.
To make sure you have a testable design, you need to understand what a seam is. Then you need to know a few rules of thumb, such as avoiding statics, using polymorphism, and properly decomposing problems and separating concerns.
Watch "The Clean Code Talks -- Unit Testing" by Misko Hevery, I think it will help you wrap your head around it.

Try looking at it from a requirements point of view, rather than an implementation point of view. Before you write code, you must understand exactly what you want it to do.
Testing and requirements should be a matching pair. If you're having trouble defining tests, maybe it's because the requirements are not well-defined. That in turn implies that you may have bugs that aren't so much implementation bugs, but "lack of clear requirements" bugs. The code writer in that case would be working to a mental list of requirements that he/she thinks is requirements, but can't be sure, and they're not written down for independent understanding and verification.
I've struggled with software where the requirements weren't clear, because the customer couldn't even tell us what they wanted. But when we delivered to them, they sure could tell us then what they didn't like about it! A big part of software engineering is getting the requirements right before the coding begins. This is true on the high-level (overall product, with requirement input from customer) and also the smaller level (modules, individual functions, where requirements are internally defined by software team or individuals). It is still true to some degree I think for iterative development, although the high-level requirements are more fluid.

#Bystrik Jurina,
I often get involved in projects which involve conversions between disparate data formats. Most answers have focused on testing a B-tree or similar algorithm, but it seems that you're looking for a more general answer.
Most of my work is based on the command line. It may sounds like a contradiction, but one of the first tools I use is visualization. I'll write some methods to write out my data structures in a format that's easy to consume. This can (and usually does) include something that's visually clear. But often it also means something that I could easily parse with a smaller test program, or even import into Excel.
I'll start by focusing on the basic outline, and write a program that does the bare minimum of what I need to accomplish. If it's a multi-step process, this might mean implementing one step at a time and validating the results of each step before moving on. Or writing something that works only in specific cases, and then expanding the set of cases where it's expected to work. At first you can validate that the code works in the limited set of cases, such as for known input data. As the project moves forward, you can start logging warnings for cases you might not have tested, or for unexpected types of input data. This has drawbacks, but is a nice approach when you're dealing with a known set of input data
Validation techniques can include formal test cases, or informal programs that work to challenge your assumptions. It could mean writing a basic driver program to exercise the "core" routines. A good example would be to add a record to a database, then read it back and compare the original object against the one loaded from the database.
If you have trouble wrapping your head around the way a program functions, think about what it needs to accomplish. It might be easier to writing code that tests the way different inputs produce different outputs. Producing visualizations is a good help, because the act of deciding how to display the data can make you think about different conditions and focus in on the most critical parts of your data structures.
Often I've found that building a visualization brings me to admit that the way the data is being stored just isn't very clear. For a B-tree, the representation isn't very flexible. But for other cases, you may be using parallel arrays when a nested tree of objects would be more natural.

Related

Red, green, refactor - why refactor?

I am trying to learn TDD and unit testing concepts and I have seen the mantra: "red, green, refactor." I am curious about why should you refactor your code after the tests pass?
This makes no sense to me, because if the tests pass, then why are you messing with the code? I also see TDD mantras like "only write enough code to make the test pass."
The only reason I could come up with, is if to make the test pass with green, you just sloppily write any old code. You just hack together a solution to get a passing test. Then obviously the code is a mess, so you can clean it up.
EDIT:
I found this link on another stackoverflow post which I think confirms the only reason I came up with, that the original code to 'pass' the test can be very simple, even hardcoded: http://blog.extracheese.org/2009/11/how_i_started_tdd.html
Usually the first working version of the code - even if not a mess - still can be improved. So you improve it, making it cleaner, more readable, removing duplication, finding better variable/method names etc. This is refactoring. And since you have the tests, you can refactor safely, because the tests will show if you have inadvertently broken something.
Note that usually you are not writing code from scratch, but modifying/extending existing code to add/change functionality. And the existing code may not be ready to accommodate the new functionality seamlessly. So the first implementation of the new functionality may look awkward or inconvenient, or you may see that it is difficult to extend further. So you improve the design to incorporate all existing functionality in the simplest, cleanest possible way while still passing all the tests.
Your question is a rehash of the age old "if it works, don't fix it". However, as Martin Fowler explains in Refactoring, code can be broken in many different ways. Even if it passes all the tests, it can be hard to understand, thus hard to extend and maintain. Moreover, if it looks sloppy, future programmers will take even less care to keep it tidy, so it will deteriorate ever quicker, and eventually degrades into a complete unmaintainable mess. To prevent this, we refactor to always keep the code clean and tidy as much as possible. If we (or our predecessors) have already let it become messy, refactoring is a huge effort with no obvious immediate benefit for management and stakeholders; thus they can hardly be convinced to support a large scale refactoring in practice. Therefore we refactor in small, even trivial steps, after every code change.
I have seen the mantra: "red, green, refactor."
it's not a 'mantra', it's a routine.
I also see TDD mantras like "only write enough code to make the test pass."
That's a guideline.
now your question:
The only reason I could come up with, is if to make the test pass with green, you just sloppily write any old code. You just hack together a solution to get a passing test. Then obviously the code is a mess, so you can clean it up.
You're almost there. The key is in the 'Design' part of TDD. You're not only coding, you're still designing your solution. That means that the exact API might not be set in stone still, and your tests might not reflect the final design (because it's not done yet). While coding "only enough to pass the test", you will hit some issues that might change your mind and guide the design. Only after you have some working code you're able to improve it.
Also, the refactor step involves the whole code, not only what you've just written to pass the last test. As the coding advances, you have more and more complex interactions between all parts of your code, the best time to refactor it is as soon as it's working.
Precisely because of this very early refactoring step, you shouldn't worry about the quality of the first iteration. it's just a proof of concept that helps in the design.
It's hard to see how the OP's skepticism isn't justified. TDD's workflow is rooted in the avoidance of premature design decisions by imposing a significant cost, if not precluding, 'seat of the pants' coding that could quickly devolve into an ill-advised YAGNI safari.[1]
The mechanism for this deferral of premature design is the 'smallest possible test'/'smallest possible code' workflow that is designed to stave off the temptation to 'fix' a perceived shortcoming or requirement before it would ordinarily need to be addressed or even encountered, i.e, presumably the shortcoming would (ought?) to be addressed in some future test case mapped directly to an acceptance criteria that in turn captures a particular business objective.
Furthermore, tests in TDD are supposed to a) help clarify design requirements, b) surface problems with a design[2], and c) serve as project assets that capture and document the effort applied to a particular story, so substituting a self-directed refactoring effort for a properly composed test not only precludes any insight the test might provide but also denies management and project planners information on the true cost of implementing a particular feature.[3]
Accordingly, I would suggest that a new test case, purpose built for introducing an additional requirement into the design, is the proper way to address any perceived shortcoming beyond a stylistic change to the current code under test, and the 'Refactor' phase, however well-intentioned, flies in the face of this philosophy, and is in fact an invitation to do the very sort of premature, YAGNI design safaris that TDD is supposed to prevent. I believe that Robert Martin's version of the 3 rules is consistent with this interpretation. [4 - A blatant appeal to authority]
[1] The previously cited http://blog.extracheese.org/2009/11/how_i_started_tdd.html elegantly demonstrates the value of deferring design decisions until the last possible moment. (Although perhaps the Fibonacci sequence is a somewhat artificial example).
[2] See https://blog.thecodewhisperer.com/permalink/how-a-smell-in-the-tests-points-to-a-risk-in-the-design
[3] Adding a "tech" or "spike" story (smell or not) to the backlog would be the appropriate method for ensuring that formal processes are followed and development effort is documented and justified... and if you can't convince the Product Owner to add it, then you shouldn't be throwing time at it.
[4] http://www.butunclebob.com/ArticleS.UncleBob.TheThreeRulesOfTdd
Because you should never refactor non-working code. If you do, then you won't know whether the errors were originally in there or due to your refactoring. If they all pass before refactoring, then fail, then you know the change you did broke something.
They don't mean to write any sloppy old code to pass a test. There is a difference between minimal and sloppy. A zen garden is minimal, but not sloppy.
However, the minimal changes you made here and there, might, in retrospect, be better combined into some other procedure that is called by both of them. After getting both tests working separately is the time to refactor. It's easier to refactor than to try and guess an architecture that's going to minimally cover all the test cases.
You make the code behave correctly first, then factor it well. If you do it the other way around you run the risk of making a mess/duplication/code smells while fixing it.
It's usually easier to restructure working code into well factored code than it is to try and design well factored code upfront.
The reason for refactoring working code is for maintenance. You want to remove duplication for reasons such as only having to fix something in one place, and also knowing that when you fix something somewhere you haven't missed the same bug in the similar code elsewhere. You want to rename vars, methods, classes if their meaning has changed from what you originally intended.
Overall, writing working code is non-trivial, and writing well factored code is non-trivial. If you are trying to do both at once you may do neither to your full potential, so giving full attention to one first and then the other is useful.
Iterative, Evolutionary Refactoring is a good approach, but first...
Somethings that should not go unsaid...
To build on top of some high-level notes above, you should understand some important concepts from Complex Systems Theory. The key concepts to note circumvolve a system's environmental structure, how a systems grows, how it behaves, and how its components interact.
Sensitive Dependence Upon Initial Conditions (Chaos Theory):
A system's behavior will be amplified toward its most influential tendency -- meaning, if you've many Broken Windows which influence how a developer will write the next module or interact with an existing one, then this developer is more likely to break another window. Its even tempting to break a window just because its the only one not broken.
Entropy:
There are many, many definitions of entropy out there; one that I find becoming to Software Engineering is: The amount of energy in a system which cannot be used for additional work. This is why reusability is crucial. Entropy is found mostly in terms of duplicate logic and comprehensibility. Furthermore, this ties closely back to the Butterfly Effect (Sensitive Dependence Upon Initial Conditions) and Broken Windows -- the more duplicate logic, the more CopyPaste for additional implementations and it is more than 1X per implementation to maintain it all.
Variable Amplification and Dampening (Emergence Theory and Network Theory):
Breaking a bad design is a good implementation, though it seems all hell breaks loose when it happens the first few times. This is why it is sensible to have an Architecture which can support many adaptations. As your system heads toward entropy, you need a way for modules to interact with each other correctly -- this is where Interfaces come in. If each of your modules cannot interact unless they've agreed to a consistent contract. Without this, you'll see your system immediately start adapting to poor implementations -- and whichever wheel is the squeakiest will get the oil; the other modules will become a headache. So, not only do bad implementations cause more bad implementations, they also create undesirable behavior at the System's Scale -- causing your system, at large, to adapt to varying implementations and amplifying entropy at the highest scale. When this happens, all you can do is keep patching and hope that one change will not conflict with these adaptations -- causing emergent, unpredictable bugs.
The key to all this is to envelop your modules into their own, discrete subsystems, and provide a Defined Architecture which can allow them to communicate -- such as a Mediator. This brings a collection of (Decoupled) behavior into a Bottom-Up System which can then focus its complexity into a component designed exactly for it.
With this type of architectural approach, you shouldn't have significant pain on the 3rd term of "Red, Green, Refactor". The question is, how can your scrum master measure this in terms of benefit to the user & stakeholders?
You should not take the "only write enough code to make the test pass." mantra too literal.
Remember your application isn't ready just because all your tests passes. You clearly would like to refactor your code after tests passes to make sure the code is readable and well architechted. The tests are there to help you refactor so refactoring is a big part of TDD.
First, thanks for taking a look into Test Driven Development. It is an awesome technique that can be applied to many coding situations that can help you develop some great code while also giving you confidence in what the code can and can't do.
If you look at subtitle on the cover of Martin Fowler's book "Refactoring" it also answers your question - "Improving the Design Of Existing Code"
Refactorings are transformations to your code that should not alter the program's behavior.
By refactoring, you can make the program easier to maintain now, and 6 months from now, and it can also make the code easier for the next developer to understand.

Do very long methods always need refactoring?

I face a situation where we have many very long methods, 1000 lines or more.
To give you some more detail, we have a list of incoming high level commands, and each generates results in a longer (sometime huge) list of lower level commands. There's a factory creating an instance of a class for each incoming command. Each class has a process method, where all the lower level commands are generated added in sequence. As I said, these sequences of commands and their parameters cause quite often the process methods to reach thousands of lines.
There are a lot of repetitions. Many command patterns are shared between different commands, but the code is repeated over and over. That leads me to think refactoring would be a very good idea.
On the contrary, the specs we have come exactly in the same form as the current code. Very long list of commands for each incoming one. When I've tried some refactoring, I've started to feel uncomfortable with the specs. I miss the obvious analogy between the specs and code, and lose time digging into newly created common classes.
Then here the question: in general, do you think such very long methods would always need refactoring, or in a similar case it would be acceptable?
(unfortunately refactoring the specs is not an option)
edit:
I have removed every reference to "generate" cause it was actually confusing. It's not auto generated code.
class InCmd001 {
OutMsg process ( InMsg& inMsg ) {
OutMsg outMsg = OutMsg::Create();
OutCmd001 outCmd001 = OutCmd001::Create();
outCmd001.SetA( param.getA() );
outCmd001.SetB( inMsg.getB() );
outMsg.addCmd( outCmd001 );
OutCmd016 outCmd016 = OutCmd016::Create();
outCmd016.SetF( param.getF() );
outMsg.addCmd( outCmd016 );
OutCmd007 outCmd007 = OutCmd007::Create();
outCmd007.SetR( inMsg.getR() );
outMsg.addCmd( outCmd007 );
// ......
return outMsg;
}
}
here the example of one incoming command class (manually written in pseudo c++)
Code never needs refactoring. The code either works, or it doesn't. And if it works, the code doesn't need anything.
The need for refactoring comes from you, the programmer. The person reading, writing, maintaining and extending the code.
If you have trouble understanding the code, it needs to be refactored. If you would be more productive by cleaning up and refactoring the code, it needs to be refactored.
In general, I'd say it's a good idea for your own sake to refactor 1000+ line functions. But you're not doing it because the code needs it. You're doing it because that makes it easier for you to understand the code, test its correctness, and add new functionality.
On the other hand, if the code is automatically generated by another tool, you'll never need to read it or edit it. So what'd be the point in refactoring it?
I understand exactly where you're coming from, and can see exactly why you've structured your code the way it is, but it needs to change.
The uncertainty you feel when you attempt to refactor can be ameliorated by writing unit tests. If you've tests specific to each spec, then the code for each spec can be refactored until you're blue in the face, and you can have confidence in it.
A second option, is it possible to automatically generate your code from a data structure?
If you've a core suite of classes that do the donkey work and edge cases, you can auto-generate the repetitive 1000 line methods as often as you wish.
However, there are exceptions to every rule.
If the methods are a literal interpretation of the spec (very little additional logic), and the specs change infrequently, and the "common" portions (i.e. bits that happen to be the same right now) of the specs change at different times, and you're not going to be asked to get a 10x performance gain out of the code anytime soon, then (and only then) . . . you may be better off with what you have.
. . . but on the whole, refactor.
Yes, always. 1000 lines is at least 10x longer than any function should ever be, and I'm tempted to say 100x, except that when dealing with input parsing and validation it can become natural to write functions with 20 or so lines.
Edit: Just re-read your question and I'm not clear on one point - are you talking about machine generated code that no-one has to touch? In which case I would leave things as they are.
Refectoring is not the same as writing from scratch. While you should never write code like this, before you refactor it, you need to consider the costs of refactoring in terms of time spent, the associated risks in terms of breaking code that already works, and the net benefits in terms of future time saved. Refactor only if the net benefits outweigh the associated costs and risks.
Sometimes wrapping and rewriting can be a safer and more cost effective solution, even if it appears expensive at first glance.
Long methods need refactoring if they are maintained (and thus need to be understood) by humans.
As a rule of thumb, code for humans first. I don't agree with the common idea that functions need to be short. I think what you need to aim at is when a human reads your code they grok it quickly.
To this effect it's a good idea to simplify things as much as possible--but not more than that. It's a good idea to delegate roughly one task for each function. There is no rule as for what "roughly one task" means: you'll have to use your own judgement for that. But do recognize that a function split into too many other functions itself reduces readability. Think about the human being who reads your function for the first time: they would have to follow one function call after another, constantly context-switching and maintaining a stack in their mind. This is a task for machines, not for humans.
Find the balance.
Here, you see how important naming things is. You will see it is not that easy to choose names for variables and functions, it takes time, but on the other hand it can save a lot of confusion on the human reader's side. Again, find the balance between saving your time and the time of the friendly humans who will follow you.
As for repetition, it's a bad idea. It's something that needs to be fixed, just like a memory leak. It's a ticking bomb.
As others have said before me, changing code can be expensive. You need to do the thinking as for whether it will pay off to spend all this time and effort, facing the risks of change, for a better code. You will possibly lose lots of time and make yourself one headache after another now, in order to possibly save lots of time and headache later.
Take a look at the related question How many lines of code is too many?. There are quite a few tidbits of wisdom throughout the answers there.
To repost a quote (although I'll attempt to comment on it a little more here)... A while back, I read this passage from Ovid's journal:
I recently wrote some code for
Class::Sniff which would detect "long
methods" and report them as a code
smell. I even wrote a blog post about
how I did this (quelle surprise, eh?).
That's when Ben Tilly asked an
embarrassingly obvious question: how
do I know that long methods are a code
smell?
I threw out the usual justifications,
but he wouldn't let up. He wanted
information and he cited the excellent
book Code Complete as a
counter-argument. I got down my copy
of this book and started reading "How
Long Should A Routine Be" (page 175,
second edition). The author, Steve
McConnell, argues that routines should
not be longer than 200 lines. Holy
crud! That's waaaaaay to long. If a
routine is longer than about 20 or 30
lines, I reckon it's time to break it
up.
Regrettably, McConnell has the cheek
to cite six separate studies, all of
which found that longer routines were
not only not correlated with a greater
defect rate, but were also often
cheaper to develop and easier to
comprehend. As a result, the latest
version of Class::Sniff on github now
documents that longer routines may not
be a code smell after all. Ben was
right. I was wrong.
(The rest of the post, on TDD, is worth reading as well.)
Coming from the "shorter methods are better" camp, this gave me a lot to think about.
Previously my large methods were generally limited to "I need inlining here, and the compiler is being uncooperative", or "for one reason or another the giant switch block really does run faster than the dispatch table", or "this stuff is only called exactly in sequence and I really really don't want function call overhead here". All relatively rare cases.
In your situation, though, I'd have a large bias toward not touching things: refactoring carries some inherent risk, and it may currently outweigh the reward. (Disclaimer: I'm slightly paranoid; I'm usually the guy who ends up fixing the crashes.)
Consider spending your efforts on tests, asserts, or documentation that can strengthen the existing code and tilt the risk/reward scale before any attempt to refactor: invariant checks, bound function analysis, and pre/postcondition tests; any other useful concepts from DBC; maybe even a parallel implementation in another language (maybe something message oriented like Erlang would give you a better perspective, given your code sample) or even some sort of formal logical representation of the spec you're trying to follow if you have some time to burn.
Any of these kinds of efforts generally have a few results, even if you don't get to refactor the code: you learn something, you increase your (and your organization's) understanding of and ability to use the code and specifications, you might find a few holes that really do need to be filled now, and you become more confident in your ability to make a change with less chance of disastrous consequences.
As you gain a better understanding of the problem domain, you may find that there are different ways to refactor you hadn't thought of previously.
This isn't to say "thou shalt have a full-coverage test suite, and DBC asserts, and a formal logical spec". It's just that you are in a typically imperfect situation, and diversifying a bit -- looking for novel ways to approach the problems you find (maintainability? fuzzy spec? ease of learning the system?) -- may give you a small bit of forward progress and some increased confidence, after which you can take larger steps.
So think less from the "too many lines is a problem" perspective and more from the "this might be a code smell, what problems is it going to cause for us, and is there anything easy and/or rewarding we can do about it?"
Leaving it cooking on the backburner for a bit -- coming back and revisiting it as time and coincidence allows (e.g. "I'm working near the code today, maybe I'll wander over and see if I can't document the assumptions a bit better...") may produce good results. Then again, getting royally ticked off and deciding something must be done about the situation is also effective.
Have I managed to be wishy-washy enough here? My point, I think, is that the code smells, the patterns/antipatterns, the best practices, etc -- they're there to serve you. Experiment to get used to them, and then take what makes sense for your current situation, and leave the rest.
I think you first need to "refactor" the specs. If there are repetitions in the spec it also will become easier to read, if it makes use of some "basic building blocks".
Edit: As long as you cannot refactor the specs, I wouldn't change the code.
Coding style guides are all made for easier code maintenance, but in your special case the ease of maintenance is achieved by following the spec.
Some people here asked if the code is generated. In my opinion it does not matter: If the code follows the spec "line by line" it makes no difference if the code is generated or hand-written.
1000 thousand lines of code is nothing. We have functions that are 6 to 12 thousand lines long. Of course those functions are so big, that literally things get lost in there, and no tool can help us even look at high level abstractions of them. the code is now unfortunately incomprehensible.
My opinion of functions that are that big, is that they were not written by brilliant programmers but by incompetent hacks who shouldn't be left anywhere near a computer - but should be fired and left flipping burgers at McDonald's. Such code wreaks havok by leaving behind features that cannot be added to or improved upon. (too bad for the customer). The code is so brittle that it cannot be modified by anyone - even the original authors.
And yes, those methods should be refactored, or thrown away.
Do you ever have to read or maintain the generated code?
If yes, then I'd think some refactoring might be in order.
If no, then the higher-level language is really the language you're working with -- the C++ is just an intermediate representation on the way to the compiler -- and refactoring might not be necessary.
Looks to me that you've implemented a separate language within your application - have you considered going that way?
It has been my understanding that it's recommended that any method over 100 lines of code be refactored.
I think some rules may be a little different in his era when code is most commonly viewed in an IDE. If the code does not contain exploitable repetition, such that there are 1,000 lines which are going to be referenced once each, and which share a significant number of variables in a clear fashion, dividing the code into 100-line routines each of which is called once may not be that much of an improvement over having a well-formatted 1,000-line module which includes #region tags or the equivalent to allow outline-style viewing.
My philosophy is that certain layouts of code generally imply certain things. To my mind, when a piece of code is placed into its own routine, that suggests that the code will be usable in more than one context (exception: callback handlers and the like in languages which don't support anonymous methods). If code segment #1 leaves an object in an obscure state which is only usable by code segment #2, and code segment #2 is only usable on a data object which is left in the state created by #1, then absent some compelling reason to put the segments in different routines, they should appear in the same routine. If a program puts objects through a chain of obscure states extending for many hundreds of lines of code, it might be good to rework the design of the code to subdivide the operation into smaller pieces which have more "natural" pre- and post- conditions, but absent some compelling reason to do so, I would not favor splitting up the code without changing the design.
For further reading, I highly recommend the long, insightful, entertaining, and sometimes bitter discussion of this topic over on the Portland Pattern Repository.
I've seen cases where it is not the case (for example, creating an Excel spreadsheet in .Net often requires a lot of line of code for the formating of the sheet), but most of the time, the best thing would be to indeed refactor it.
I personally try to make a function small enough so it all appears on my screen (without affecting the readability of course).
1000 lines? Definitely they need to be refactored. Also not that, for example, default maximum number of executable statements is 30 in Checkstyle, well-known coding standard checker.
If you refactor, when you refactor, add some comments to explain what the heck it's doing.
If it had comments, it would be much less likely a candidate for refactoring, because it would already be easier to read and follow for someone starting from scratch.
Then here the question: in general, do
you think such very long methods would
always need refactoring,
if you ask in general, we will say Yes .
or in a
similar case it would be acceptable?
(unfortunately refactoring the specs
is not an option)
Sometimes are acceptable, but is very unusual, I will give you a pair of examples:
There are some 8 bit microcontrollers called Microchip PIC, that have only a fixed 8 level stack, so you can't nest more than 8 calls, then care must be taken to avoid "stack overflow", so in this special case having many small function (nested) is not the best way to go.
Other example is when doing optimization of code (at very low level) so you have to take account the jump and context saving cost. Use it with care.
EDIT:
Even in generated code, you could need to refactorize the way its generated, for example for memory saving, energy saving, generate human readable, beauty, who knows, etc..
There has been very good general advise, here a practical recommendation for your sample:
common patterns can be isolated in plain feeder methods:
void AddSimpleTransform(OutMsg & msg, InMsg const & inMsg,
int rotateBy, int foldBy, int gonkBy = 0)
{
// create & add up to three messages
}
You might even improve that by making this a member of OutMsg, and using a fluent interface, such that you can write
OutMsg msg;
msg.AddSimpleTransform(inMsg, 12, 17)
.Staple("print")
.AddArtificialRust(0.02);
which can be an additional improvement under circumstances.

Any advice for a developer given the task of enhancing & refactoring a business critical application?

Recently I inherited a business critical project at work to "enhance". The code has been worked on and passed through many hands over the past five years. Consultants and full-time employees who are no longer with the company have butchered this very delicate and overly sensitive application. Most of us have to deal with legacy code or this type of project... its part of being a developer... but...
There are zero units and zero system tests. Logic is inter-mingled (and sometimes duplicated for no reason) between stored procedures, views (yes, I said views) and code. Documentation? Yeah, right.
I am scared. Yes, very sacred to make even the most minimal of "tweak" or refactor. One little mishap, and there would be major income loss and potential legal issues for my employer.
So, any advice? My first thought would be to begin writing assertions/unit tests against the existing code. However, that can only go so far because there is a lot of logic embedded in stored procedures. (I know its possible to test stored procedures, but historically its much more difficult compared to unit testing source code logic).
Another or additional approach would be to compare the database state before and after the application has performed a function, make some code changes, then do database state compare.
I just rewrote thousands of lines of the most complex subsystem of an enterprise filesystem to make it multi-threaded, so all of this comes from experience. If the rewrite is justified (it is if the rewrite is being done to significantly enhance capabilities, or if existing code is coming in the way of putting in more enhancements), then here are the pointers:
You need to be confident in your own abilities first of all to do this. That comes only if you have enough prior experience with the technologies involved.
Communicate, communicate, communicate. Let all involved stake-holders know, this is a mess, this is risky, this cannot be done in a hurry, this will need to be done piece-meal - attack one area at a time.
Understand the system inside out. Document every nuance, trick and hack. Document the overall design. Ask any old-timers about historical reasons for the existence of any code you cannot justify. These are the mines you don't want to step on - you might think those are useless pieces of code and then regret later after getting rid of them.
Unit test. Work the system through any test-suite which already exists, otherwise first write the tests for existing code, if they don't exist.
Spew debugging code all over the place during the rewrite - asserts, logging, console prints (you should have the ability to turn them on and off, as well specify different levels of output i.e. control verbosity). This is a must in my experience, and helps tremendously during a rewrite.
When going through the code, make a list of all things that need to be done - things you need to find out, things you need to write tests for, things you need to ask questions about, notes to remind you how to refactor some piece of code, anything that can affect your rewrite... you cannot afford to forget anything! I do this using Outlook Tasks (just make sure whatever you use is always in front of you - this is the first app I open as soon as I sit down on the desk). If I get interrupted, I write down anything that I have been thinking about and hints about where to continue after coming back to the task.
Try avoiding hacks in your rewrite (that's one of the reasons you are rewriting it). Think about tough problems you encounter. Discuss them with other people and bounce off your ideas against them (nothing beats this), and put in clean solutions. Look at all the tasks you put into the todo list - make a 10,000 feet picture of existing design, then decide how the new rewrite would look like (in terms of modules, sub-modules, how they fit together etc.).
Tackle the toughest problems before any other. That'll save you from running into problems you cannot solve near the end of tunnel, and save you from taking any steps backward. Of course, you need to know what the toughest problems will be - so again, better document everything first during your forays into existing code.
Get a very firm list of requirements.
Make sure you have implicit requirements as well as explicit ones - i.e. what programs it has to work with, and how.
Write all scenarios and use cases for how it is currently being used.
Write a lot of unit tests.
Write a lot of integration tests to test the integration of the program with existing programs it has to work with.
Talk to everyone who uses the program to find out more implicit requirements.
Test, test, test changes before moving into production.
CYA :)
Two things, beyond #Sudhanshu's great list (and, to some extent, disagreeing with his #8):
First, be aware that untested code is buggy code - what you are starting with almost certainly does not work correctly, for any definition of "correct" other than "works just like the unmodified code". That is, be prepared to find unexpected behavior in the system, to ask experts in the system about that behavior, and for them to conclude that it's not working the way it should. Prepare them for it to - warn them that without tests or other documentation, there's no reason to think it works they way they think it's working.
Next: Refactor The Low-Hanging Fruit Take it easy, take it slow, take it very careful. Notice something easy in the code - duplication, say - and test the hell out of whatever methods contain the duplication, then eliminate it. Lather, rinse, repeat. Don't write tests for everything before making changes, but write tests for whatever you're changing. This way, it stays releasable at every stage and you are continuously adding value, continuously improving the code base.
I said "two things", but I guess I'll add a third: Manage expectations. Let your customer know how scared you are of this task; let them know how bad what they've got is. Let them know how slow progress will be, and let them know you'll keep them informed of that progress (and, of course, do it). Your customer may think s/he's asking for "just a little fix" - and the functionality may indeed change only a little - but that doesn't mean it's not going to be a lot of work and a lot of time. You understand that; your customer needs to, too.
I've had this problem before and I've asked around (before the days of stack overflow) and this book has always been recommended to me. http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
Ask yourself this: what are you trying to achieve? What is your mission? How much time do you have? What is the measurement for success? What risks are there? How do you mitigate and deal with them?
Don't touch anything unless you know what it is you're trying to achieve.
The code might be "bad" but what does that mean? The code works right? So if you rewrite the code so it does the same thing you'll have spent a lot of time rewriting something introducing bugs along the way so the code does the same thing? To what end?
The simplest thing you can do is document what the system does. And I don't mean write mind-numbing Word documents no one will ever read. I mean writing tests on key functionality, refactoring the code if necessary to allow such tests to be written.
You said you are scared to touch the code because of legal, income loss and that there is zero documentation. So do you understand the code? The first thing you should do is document it and make sure you understand it before you even think about refactoring. Once you have done that and identified the problem areas make a list of your refactoring proposals in the order of maximum benefit with minimum changes and attack it incrementally. Refactoring makes additional sense if: the expected lifespan of the code will be long, new features will be added, bug fixes are numerous. As for testing the database state - I worked on a project recently where that is exactly what we did with success.
Is it possible to get a separation of the DB and non-DB parts, so that a DBA can take on the challenge of the stored procedures and databases themselves freeing you up to work on the other parts of the system? This also presumes that there is a DBA who can step up and take that part of the application.
If that isn't possible, then I'd make the suggestion of seeing how big is the codebase and if it is possible to get some assistance so it isn't all on you. While this could be seen as side-stepping responsibility, the point would be that things shouldn't be in just one person's hands usually as they can disappear at times.
Good luck!

Does YAGNI also apply when writing tests?

When I write code I only write the functions I need as I need them.
Does this approach also apply to writing tests?
Should I write a test in advance for every use-case I can think of just to play it safe or should I only write tests for a use-case as I come upon it?
I think that when you write a method you should test both expected and potential error paths. This doesn't mean that you should expand your design to encompass every potential use -- leave that for when it's needed, but you should make sure that your tests have defined the expected behavior in the face of invalid parameters or other conditions.
YAGNI, as I understand it, means that you shouldn't develop features that are not yet needed. In that sense, you shouldn't write a test that drives you to develop code that's not needed. I suspect, though, that's not what you are asking about.
In this context I'd be more concerned with whether you should write tests that cover unexpected uses -- for example, errors due passing null or out of range parameters -- or repeating tests that only differ with respect to the data, not the functionality. In the former case, as I indicated above, I would say yes. Your tests will document the expected behavior of your method in the face of errors. This is important information to people who use your method.
In the latter case, I'm less able to give you a definitive answer. You certainly want your tests to remain DRY -- don't write a test that simply repeats another test even if it has different data. Alternatively, you may not discover potential design issues unless you exercise the edge cases of your data. A simple example is a method that computes a sum of two integers: what happens if you pass it maxint as both parameters? If you only have one test, then you may miss this behavior. Obviously, this is related to the previous point. Only you can be sure when a test is really needed or not.
Yes YAGNI absolutely applies to writing tests.
As an example, I, for one, do not write tests to check any Properties. I assume that properties work a certain way, and until I come to one that does something different from the norm, I won't have tests for them.
You should always consider the validity of writing any test. If there is no clear benefit to you in writing the test, then I would advise that you don't. However, this is clearly very subjective, since what you might think is not worth it someone else could think is very worth the effort.
Also, would I write tests to validate input? Absolutely. However, I would do it to a point. Say you have a function with 3 parameters that are ints and it returns a double. How many tests are you going to write around that function. I would use YAGNI here to determine which tests are going to get you a good ROI, and which are useless.
Write the test as you need it. Tests are code. Writing a bunch of (initially failing) tests up front breaks the red/fix/green cycle of TDD, and makes it harder to identify valid failures vs. unwritten code.
You should write the tests for the use cases you are going to implement during this phase of development.
This gives the following benefits:
Your tests help define the functionality of this phase.
You know when you've completed this phase because all of your tests pass.
You should write tests that cover all your code, ideally. Otherwise, the rest of your tests lose value, and you will in the end debug that piece of code repeatedly.
So, no. YAGNI does not include tests :)
There is of course no point in writing tests for use cases you're not sure will get implemented at all - that much should be obvious to anyone.
For use cases you know will get implemented, test cases are subject to diminishing returns, i.e. trying to cover each and every possible obscure corner case is not a useful goal when you can cover all important and critical paths with half the work - assuming, of course, that the cost of overlooking a rarely occurring error is endurable; I would certainly not settle for anything less than 100% code and branch coverage when writing avionics software.
You'll probably get some variance here, but generally, the goal of writing tests (to me) is to ensure that all your code is functioning as it should, without side effects, in a predictable fashion and without defects. In my mind, then, the approach you discuss of only writing tests for use cases as they are come upon does you no real good, and may in fact cause harm.
What if the particular use case for the unit under test that you ignore causes a serious defect in the final software? Has the time spent developing tests bought you anything in this scenario beyond a false sense of security?
(For the record, this is one of the issues I have with using code coverage to "measure" test quality -- it's a measurement that, if low, may give an indication that you're not testing enough, but if high, should not be used to assume that you are rock-solid. Get the common cases tested, the edge cases tested, then consider all the ifs, ands and buts of the unit and test them, too.)
Mild Update
I should note that I'm coming from possibly a different perspective than many here. I often find that I'm writing library-style code, that is, code which will be reused in multiple projects, for multiple different clients. As a result, it is generally impossible for me to say with any certainty that certain use cases simply won't happen. The best I can do is either document that they're not expected (and hence may require updating the tests afterward), or -- and this is my preference :) -- just writing the tests. I often find option #2 is for more livable on a day-to-day basis, simply because I have much more confidence when I'm reusing component X in new application Y. And confidence, in my mind, is what automated testing is all about.
You should certainly hold off writing test cases for functionality you're not going to implement yet. Tests should only be written for existing functionality or functionality you're about to put in.
However, use cases are not the same as functionality. You only need to test the valid use cases that you've identified, but there's going to be a lot of other things that might happen, and you want to make sure those inputs get a reasonable response (which could well be an error message).
Obviously, you aren't going to get all the possible use cases; if you could, there'd be no need to worry about computer security. You should get at least the more plausible ones, and as problems come up you should add them to the use cases to test.
I think the answer here is, as it is in so many places, it depends. If the contract that a function presents states that it does X, and I see that it's got associated unit tests, etc., I'm inclined to think it's a well-tested unit and use it as such, even if I don't use it that exact way elsewhere. If that particular usage pattern is untested, then I might get confusing or hard-to-trace errors. For this reason, I think a test should cover all (or most) of the defined, documented behavior of a unit.
If you choose to test more incrementally, I might add to the doc comments that the function is "only tested for [certain kinds of input], results for other inputs are undefined".
I frequently find myself writing tests, TDD, for cases that I don't expect the normal program flow to invoke. The "fake it 'til you make it" approach has me starting, generally, with a null input - just enough to have an idea in mind of what the function call should look like, what types its parameters will have and what type it will return. To be clear, I won't just send null to the function in my test; I'll initialize a typed variable to hold the null value; that way when Eclipse's Quick Fix creates the function for me, it already has the right type. But it's not uncommon that I won't expect the program normally to send a null to the function. So, arguably, I'm writing a test that I AGN. But if I start with values, sometimes it's too big a chunk. I'm both designing the API and pushing its real implementation from the beginning. So, by starting slow and faking it 'til I make it, sometimes I write tests for cases I don't expect to see in production code.
If you're working in a TDD or XP style, you won't be writing anything "in advance" as you say, you'll be working on a very precise bit of functionality at any given moment, so you'll be writing all the necessary tests in order make sure that bit of functionality works as you intend it to.
Test code is similar with "code" itself, you won't be writing code in advance for every use cases your app has, so why would you write test code in advance ?

Testing When Correctness is Poorly Defined?

I generally try to use unit tests for any code that has easily defined correct behavior given some reasonably small, well-defined set of inputs. This works quite well for catching bugs, and I do it all the time in my personal library of generic functions.
However, a lot of the code I write is data mining code that basically looks for significant patterns in large datasets. Correct behavior in this case is often not well defined and depends on a lot of different inputs in ways that are not easy for a human to predict (i.e. the math can't reasonably be done by hand, which is why I'm using a computer to solve the problem in the first place). These inputs can be very complex, to the point where coming up with a reasonable test case is near impossible. Identifying the edge cases that are worth testing is extremely difficult. Sometimes the algorithm isn't even deterministic.
Usually, I do the best I can by using asserts for sanity checks and creating a small toy test case with a known pattern and informally seeing if the answer at least "looks reasonable", without it necessarily being objectively correct. Is there any better way to test these kinds of cases?
I think you just need to write unit tests based on small sets of data that will make sure that your code is doing exactly what you want it to do. If this gives you a reasonable data-mining algorithm is a separate issue, and I don't think it is possible to solve it by unit tests. There are two "levels" of correctness of your code:
Your code is correctly implementing the given data mining algorithm (this thing you should unit-test)
The data mining algorithm you implement is "correct" - solves the business problem. This is a quite open question, it probably depends both on some parameters of your algorithm as well as on the actual data (different algorithms work for different types of data).
When facing cases like this I tend to build one or more stub data sets that reflect the proper underlying complexities of the real-life data. I often do this together with the customer, to make sure I capture the essence of the complexities.
Then I can just codify these into one or more datasets that can be used as basis for making very specific unit tests (sometimes they're more like integration tests with stub data, but I don't think that's an important distinction). So while your algorithm may have "fuzzy" results for a "generic" dataset, these algorithms almost always have a single correct answer for a specific dataset.
Well, there are a few answers.
First of all, as you mentioned, take a small case study, and do the math by hand. Since you wrote the algorithm, you know what it's supposed to do, so you can do it in a limited case.
The other one is to break down every component of your program into testable parts.
If A calls B calls C calls D, and you know that A,B,C,D, all give the right answer, then you test A->B, B->C, and C->D, then you can be reasonably sure that A->D is giving the correct response.
Also, if there are other programs out there that do what you are looking to do, try and aquire their datasets. Or an opensource project that you could use test data against, and see if your application is giving similar results.
Another way to test datamining code is by taking a test set, and then introducing a pattern of the type you're looking for, and then test again, to see if it will separate out the new pattern from the old ones.
And, the tried and true, walk through your own code by hand and see if the code is doing what you meant it to do.
Really, the challenge here is this: because your application is meant to do a fuzzy, non-deterministic kind of task in a smart way, the very goal you hope to achieve is that the application becomes better than human beings at finding these patterns. That's great, powerful, and cool ... but if you pull it off, then it becomes very hard for any human beings to say, "In this case, the answer should be X."
In fact, ideally the computer would say, "Not really. I see why you think that, but consider these 4.2 terabytes of information over here. Have you read them yet? Based on those, I would argue that the answer should be Z."
And if you really succeeded in your original goal, the end user might sometimes say, "Zowie, you're right. That is a better answer. You found a pattern that is going to make us money! (or save us money, or whatever)."
If such a thing could never happen, then why are you asking the computer to detect these kinds of patterns in the first place?
So, the best thing I can think of is to let real life help you build up a list of test scenarios. If there ever was a pattern discovered in the past that did turn out to be valuable, then make a "unit test" that sees if your system discovers it when given similar data. I say "unit test" in quotes because it may be more like an integration test, but you may still choose to use NUnit or VS.Net or RSpec or whatever unit test tools you're using.
For some of these tests, you might somehow try to "mock" the 4.2 terabytes of data (you won't really mock the data, but at some higher level you'd mock some of the conclusions reached from that data). For others, maybe you have a "test database" with some data in it, from which you expect a set of patterns to be detected.
Also, if you can do it, it would be great if the system could "describe its reasoning" behind the patterns it detects. This would let the business user deliberate over the question of whether the application was right or not.
This is tricky. This sounds similar to writing tests around our text search engine. If you keep struggling, you'll figure something out:
Start with a small, simplified but reasonably representative data sample, and test basic behavior doing this
Rather than asserting that the output is exactly some answer, sometimes it's better to figure out what is important about it. For example, for our search engine, I didn't care so much about the exact order the documents were listed, as long as the three key ones were on the first page of results.
As you make a small, incremental change, figure out what the essence of it is and write a test for that. Even though the overall calculations take many inputs, individual changes to the codebase should be isolatable. For example, we found certain documents weren't being surfaced because of the presence of hyphens in some of the key words. We created tests that testing that this was behaving how we expected.
Look at tools like Fitness, which allow you to throw a large number of datasets at a piece of code and assert things about the results. This may be easier to understand than more traditional unit tests.
I've gone back to the product owner, saying "I can't understand how this will work. How will we know if it's right?" Maybe s/he can articulate the essence of the vaguely defined problem. This has worked really well for me many times, and I've talked people out of features because they couldn't be explained.
Be creative!
Ultimately, you have to decide what your program should be doing, and then test for that.