Refactoring and Test Driven Development

Refactoring and Test Driven Development - unit-testing

I'm Currently reading two excellent books "Working Effectively with Legacy Code" and "Clean Code".
They are making me think about the way I write and work with code in completely new ways but one theme that is common among them is test driven development and the idea of smothering everything with tests and having tests in place before you make a change or implement a new piece of functionality.
This has led to two questions:
Question 1:
If I am working with legacy code. According to the books I should put tests in place to ensure I'm not breaking anything. Consider that I have a method 500 lines long. I would assume I'll have a set of equivalent testing methods to test that method. When I split this function up, do I create new tests for each new method/class that results?
According to "Clean Code" any test that takes longer than 1/10th of a second is a test that takes too long. Trying to test a 500 long line legacy method that goes to databases and does god knows what else could well take longer than 1/10th of a second. While I understand you need to break dependencies what I'm having trouble with is the initial test creation.
Question 2:
What happens when the code is re-factored so much that structurally it no longer resembles the original code (new parameters added/removed to methods etc). It would follow that the tests will need re-factoring also? In that case you could potentially altering the functionality of the system while the allowing the tests to keep passing? Is re-factoring tests an appropriate thing to do in this circumstance?
While its ok to plod on with assumptions I was wondering whether there are any thoughts/suggestions on such matters from a collective experience.

That's the deal when working with legacy code. Legacy meaning a system with no tests and which is tightly coupled. When adding tests for that code, you are effectively adding integration tests. When you refactor and add the more specific test methods that avoid the network calls, etc those would be your unit tests. You want to keep both, just have then separate, that way most of your unit tests will run fast.
You do that in really small steps. You actually switch continually between tests and code, and you are correct, if you change a signature (small step) related tests need to be updated.
Also check my "update 2" on How can I improve my junit tests. It isn't about legacy code and dealing with the coupling it already has, but on how you go about writing logic + tests where external systems are involved i.e. databases, emails, etc.

The 0.1s unit test run time is fairly silly. There's no reason unit tests shouldn't use a network socket, read a large file or other hefty operations if they have to. Yes it's nice if the tests run quickly so you can get on with the main job of writing the application but it's much nicer to end up with the best result at the end and if that means running a unit test that takes 10s then that's what I'd do.
If you're going to refactor the key is to spend as much time as you need to understand the code you are refactoring. One good way of doing that would be to write a few unit tests for it. As you grasp what certain blocks of code are doing you could refactor it and then it's good practice to write tests for each of your new methods as you go.

Yes, create new tests for new methods.
I'd see the 1/10 of a second as a goal you should strive for. A slower test is still much better than no test.
Try not to change the code and the test at the same time. Always take small steps.

When you've got a lengthy legacy method that does X (and maybe Y and Z because of its size), the real trick is not breaking the app by 'fixing' it. The tests on the legacy app have preconditions and postconditions and so you've got to really know those before you go breaking it up. The tests help to facilitate that. As soon as you break that method into two or more new methods, obviously you need to know the pre/post states for each of those and so tests for those 'keep you honest' and let you sleep better at night.
I don't tend to worry too much about the 1/10th of a second assertion. Rather, the goal when I'm writing unit tests is to cover all my bases. Obviously, if a test takes a long time, it might be because what is being tested is simply way too much code doing way too much.
The bottom line is that you definitely don't want to take what is presumably a working system and 'fix' it to the point that it works sometimes and fails under certain conditions. That's where the tests can help. Each of them expects the world to be in one state at the beginning of the test and a new state at the end. Only you can know if those two states are correct. All the tests can 'pass' and the app can still be wrong.
Anytime the code gets changed, the tests will possibly change and new ones will likely need to be added to address changes made to the production code. Those tests work with the current code - doesn't matter if the parameters needed to change, there are still pre/post conditions that have to be met. It isn't enough, obviously, to just break up the code into smaller chunks. The 'analyst' in you has to be able to understand the system you are building - that's job one.
Working with legacy code can be a real chore depending on the 'mess' you start with. I really find that knowing what you've got and what it is supposed to do (and whether it actually does it at step 0 before you start refactoring it) is key to a successful refactoring of the code. One goal, I think, is that I ought to be able to toss out the old stuff, stick my new code in its place and have it work as advertised (or better). Depending on the language it was written in, the assumptions made by the original author(s) and the ability to encapsulate functionality into containable chunks, it can be a real trick.
Best of luck!

Here's my take on it:
No and yes. First things first is to have a unit test that checks the output of that 500 line method. And then that's only when you begin thinking of splitting it up. Ideally the process will go like this:
Write a test for the original legacy 500-line behemoth
Figure out, marking first with comments, what blocks of code you could extract from that method
Write a test for each block of code. All will fail.
Extract the blocks one by one. Concentrate on getting all the methods go green one at a time.
Rinse and repeat until you've finished the whole thing
After this long process you will realize that it might make sense that some methods be moved elsewhere, or are repetitive and several can be reduced to a single function; this is how you know that you succeeded. Edit tests accordingly.
Go ahead and refactor, but as soon as you need to change signatures make the changes in your test first before you make the change in your actual code. That way you make sure that you're still making the correct assertions given the change in method signature.

Question 1: "When I split this function up, do I create new tests for each new method/class that results?"
As always the real answer is it depends. If it is appropriate, it may be simpler when refactoring some gigantic monolithic methods into smaller methods that handle different component parts to set your new methods to private/protected and leave your existing API intact in order to continue to use your existing unit tests. If you need to test your newly split off methods, sometimes it is advantageous to just mark them as package private so that your unit testing classes can get at them but other classes cannot.
Question 2: "What happens when the code is re-factored so much that structurally it no longer resembles the original code?"
My first piece of advice here is that you need to get a good IDE and have a good knowledge of regular expressions - try to do as much of your refactoring using automated tools as possible. This can help save time if you are cautious enough not to introduce new problems. As you said, you have to change your unit tests - but if you used good OOP principals with the (you did right?), then it shouldn't be so painful.
Overall, it is important to ask yourself with regards to the refactor do the benefits outweigh the costs? Am I just fiddling around with architectures and designs? Am I doing a refactor in order to understand the code and is it really needed? I would consult a coworker who is familiar with the code base for their opinion on the cost/benefits of your current task.
Also remember that the theoretical ideal you read in books needs to be balanced with real world business needs and time schedules.

Related

How do I ensure that I don't break the test code when I refactor it?

Code evolves, and as it does, it also decays if not pruned, a bit like a garden in that respect. Pruning mean refactoring to make it fulfill its evolving purpose.
Refactoring is much safer if we have a good unit test coverage.
Test-driven development forces us to write the test code first, before the production code. Hence, we can't test the implementation, because there isn't any. This makes it much easier to refactor the production code.
The TDD cycle is something like this: write a test, test fails, write production code until the test succeeds, refactor the code.
But from what I've seen, people refactor the production code, but not the test code. As test code decays, the production code will go stale and then everything goes downhill. Therefore, I think it is necessary to refactor test code.
Here's the problem: How do you ensure that you don't break the test code when you refactor it?
(I've done one approach, https://thecomsci.wordpress.com/2011/12/19/double-dabble/, but I think there might be a better way.)
Apparently there's a book, http://www.amazon.com/dp/0131495054, which I haven't read yet.
There's also a Wiki page about this, http://c2.com/cgi/wiki?RefactoringTestCode, which doesn't have a solution.

Refactoring your tests is a two step process. Simply stated: First you must use your application under test to ensure that the tests pass while refactoring. Then, after your refactored tests are green, you must ensure that they will fail. However to do this properly requires some specific steps.
In order to properly test your refactored tests, you must change the application under test to cause the test to fail. Only that test condition should fail. That way you can ensure that the test is failing properly in addition to passing. You should strive for a single test failure, but that will not be possible in some cases (i.e. not unit tests). However if you are refactoring correctly there will be a single failure in the refactored tests, and the other failures will exist in tests not related to the current refactoring. Understanding your codebase is required to properly identify cascading failures of this type and failures of this type only apply to tests other than unit tests.

I think you should not change your test code.
Why?
In TDD, you define a interface for a class.
This interface contains methods that are defined with a certain set of functionality.The requirements / design.
First: These requirements do not change while refactoring your production code. Refactoring means: changing/cleaning the code without changing the functionality.
Second: The test checks a certain set of functionality, this set stays the same.
Conclusion: Refactoring test and refactoring your production code are two different things.
Tip:When write your tests, write clean code. Make small tests. Which really test one piece of the functionality.
But "Your design changes because of unforeseen changes to the requirements". This may lead or may not lead to changes in the interface.
When your requirements change, your tests must change. This is not avoidable.
You have to keep in mind that this is a new TDD cycle. First test the new functionality and remove the old functionality tests. Then implement the new design.
To make this work properly, you need clean and small tests.
Example:
MethodOne does: changeA and changeB
Don't put this in 1 unit test, but make a test class with 2 unit tests.
Both execute MethodOne, but they check for other results (changeA, changeB).
When the specification of changeA changes, you only need to rewrite 1 unit method.
When MethodOne gets a new specification changeC: Add a unit test.
With the above example your tests will be more agile and easier to change.
Summary:
Dont refactor your tests, when refactoring your production code.
Write clean and agile tests.
Hopes this helps.
Good luck with it.
#disclaimer: I do not want your money if this makes you rich.

How do you ensure that you don't break the test code when you refactor
it?
Rerunning the tests should suffice in most cases.
There are some other strategies described here but they might be overkill compared to the few benefits you get.

Um.
FOR JAVA SOLUTION! I don't know what language you're programming in!
Ok, I just read "Clean Code" by one of the Martins, a book which argues that the idea of refactoring test code to keep clean and readible is fine idea, nad indeed a goal. So the ambition to refactor and keep code clean is Good, not a silly idea like I first thought.
But that's not what you asked, so let's take a shot at answering!
I'd keep a db of your tests - or the last test result, anyway.
With a bit of java annotating, you can do something like this:
#SuperTestingFramerworkCapable
public class MyFancyTest {
#TestEntry
#Test
public testXEqualsYAfterConstructors(){
#TestElement
//create my object X
#TestElement
//create my object Y
#TheTest
AssertTrue(X.equals(Y));
}
}
ANYWAY, you'd also need a reflection and annotation-processing super class, that would inspect this code. It could just be an extra step in your processing - write tests, pass through this super processor, and then, if it passes, run the tests.
And your super processor is going to use a schema
MyFancyTest
And for each member you have in your class, it will use a new table - here the (only) table would be testXEqualsYAfterConstructors
And that table would have columns for each item marked with the #TestElement annotation. And it would also have a column for #TheTest
I suppose you'd just call the columns TestElement1, TestElement2 etc etc
And THEN, once it had set all this up, it would just save the variable names and the line annotated #TheTest.
So the table would be
testXEqualsYAfterConstructors
TestElement1 | TestElement2 | TheTest
SomeObjectType X | SomeObjectType X | AssertTrue(X.equals(Y));
So, if the super processor goes and finds tables exist, then it can compare what is already there with what is now in the code, and it can raise an alert for each differing entry. And you can create a new user - an Admin - who can get the changes, and can check over them, crucible style, and ok or not them.
And then you can market this solution for this problem, sell you company for 100M and give me 20%
cheers!
Slow day, here's the rational:
yuor solution uses a lot of extra overhead, most damagingly, in the actual production code. Your prod code shouldn't be tied to your test code, ever, and it certainly shouldn't have random variable that are test specific in it.
The next suggestion I have with the code you put up is that your framework doesn't stop people breaking tests. After all, you can have this:
#Test
public void equalsIfSameObject()
{
Person expected = createPerson();
Person actual = expected;
check(Person.FEATURE_EQUAL_IF_SAME_OBJECT);
boolean isEqual = actual.equals(expected);
assertThat(isEqual).isTrue();
}
But if I change the last two lines of code in some "refactoring" of test classes, then your framework is going to report a success, but the test won't do anything. You really need to ensure that an alert is raised and people can look at the "difference".
Then again, you might just want to use svn or perforce and crucible to compare and check this stuff!
Also, seeing as you're keen on a New Idea, you'll want to read about local annotations:http://stackoverflow.com/questions/3285652/how-can-i-create-an-annotation-processor-that-processes-a-local-variable
Um, so you might need to get that guy's - see the last comment in the link above - you might need his custom java compiler too.
#Disclaimer
If you create a new company with code that pretty much follows the above, I reserve the right to 20% of the company if and when you're worth more than 30M, at a time of my choosing

About two months before your question was one of my main questions in refactoring. Just let me explain my experience:
when you want to refactor a method, you should cover it with unit tests(or any other tests) to be sure you are not breaking something during refactoring(in my case the team knew the code worked well because they had been using it for 6 years, they just needed to improve it, so all of my unit tests passed in first step).So in first step you have some passed unit tests that cover whole scenarios. If some of unit tests fails, firstly you should fix the problem to be sure your method works correctly.
after passing all of tests, you have refactored the method and you want to run your test to be sure every thing is right. Any changes in test codes?
you should write tests that are independent from internal structure of method. After refactoring, you should just change some small part of code and in the most of the time no changes are required, because refactoring just improves the structure and doesn't change the behavior. If your test code needed to change a lot, you never know if you've broke some things on main code during refactoring or not.
and most important thing for me is to remember in every test, one behavior should be considered
I hope I could explain well.

When should I design and document my test cases?

IN SDLC, the Testing procedure should be right after implementation. However, Test-driven development encourages us to do testing while doing implementation. And in my lecture course, Prof said test cases should be part of the design.
I am a junior developer, to implement a new feature, when should I design and document my test cases?
I found that it is not so practical to test all the cases after finishing the implementation. It is because once a cases is failed, I have to change the codes and retest all cases again. Is there another way to overcome and avoid this? I know automated testes is one of the solution, but somehow, automated testes cannot stimulate all of the test cases, especially integration test cases which involves different parties.
Also, in my test cases, should I test all parts of the code? OR just test the functionality of that features request? OR it actually depends on how much time you got?
Many thanks.

Your question isn't so easy to answer, because, as you say, "it actually depends on how much time you got." Here are some opinions though:
Test after implementation: No
As a programmer, you're an expensive and scarce resource with multiple deadlines stacked up on top of each other. So effectively, this means "never test". After you've implemented one chunk of code, you will move on to the next chunk of code, and mean to come back to write tests "when you have time" (you never have time).
There are also the problems you mention. If you do all your testing after the code is written, and your tests discover something fundamentally wrong, you have to go back and fix all your code as well as all your tests.
Test while implementing: Yes
This method is actually really helpful once you get a rhythm for it. You write a bit of a class, then write a bit of a unit test, and continually revise your tests and your code until you're finished. I believe it is actually faster than writing code without tests.
It is also particularly helpful when working on a large project. Running a unit test to see if a little module is working is instantaneous. Building and loading your entire application to see if a little module is working may take several minutes. It may also interrupt your concentration (which costs at least 10 minutes).
What to test: As much as possible
100% test coverage is probably never practical. But absolutely test the critical pieces of your program, things that perform mathematical computation or lots of business logic. Test everything that's leftover as much as possible. There's no reason to test a "toString()" function, unless that happens to be critical to business logic or something.
Also, keep your tests as simple as possible, just inputs and outputs. Most of my test functions are two or three lines. If your function is hard to test because there are too many combinations, it's a sign that your function might need to be broken up a little bit. Make sure to test edge cases and "impossible" scenarios.

My experience:
Document your tests with if the code is not self-explanatory or if the case being tested is a 'tricky' corner case which is not obvious at first sight (although the code may be).
Don't create separate documents for your tests. Put everything in comments and Javadocs if you are using Java. In other words, keep this information close to the code. That's where it is needed.
About designing and implementation: just iterate. Write some implementation, then a bit of testing code for it, then more implementation code, etc... until you are done with both implementation and test code. It goes faster than writing all implementation, then testing, then rewriting failing implementation code. You can't anticipate all tests to implement at design time, it is impossible. So, no worries if you don't get it all.
If you cover more than 80% of the code you are already good, more is better. Sometimes, code can't be tested. I recommend using test coverage tools, such as Emma for Java.
OR it actually depends on how much time you got?
The time you save by not testing is never ever ever paying for the time you have to spent to solve bugs later in the project. A proper test set always pays a lot down the road, always.

TDD with unclear requirements

I know that TDD helps a lot and I like this method of development when you first create a test and then implement the functionality. It is very clear and correct way.
But due to some flavour of my projects it often happens that when I start to develop some module I know very little about what I want and how it will look at the end. The requirements appear as I develop, there may be 2 or 3 iterations when I delete all or part of the old code and write new.
I see two problems:
1. I want to see the result as soon as possible to understand are my ideas right or wrong. Unit tests slow down this process. So it often happens that I write unit tests after the code is finished what is known to be a bad pattern.
2. If I first write the tests I need to rewrite not only the code twice or more times but also the tests. It takes much time.
Could someone please tell me how can TDD be applied in such situation?
Thanks in advance!

I want to see the result as soon as possible to understand are my ideas right or wrong. Unit tests slow down this process.
I disagree. Unit tests and TDD can often speed up getting results because they force you to concentrate on the results rather than implementing tons of code that you might never need. It also allows you to run the different parts of your code as you write them so you can constantly see what results you are getting, rather than having to wait until your entire program is finished.

I find that TDD works particularly well in this kind of situation; in fact, I would say that having unclear and/or changing requirements is actually very common.
I find that the best uses of TDD is ensuring that your code is doing what you expect it to do. When you're writing any code, you should know what you want it to do, whether the requirements are clear or not. The strength of TDD here is that if there is a change in the requirements, you can simply change one or more of your unit tests to reflect the changed requirements, and then update your code while being sure that you're not breaking other (unchanged) functionality.
I think that one thing that trips up a lot of people with TDD is the assumption that all tests need to be written ahead of time. I think it's more effective to use the rule of thumb that you never write any implementation code while all of your tests are passing; this simply ensures that all code is covered, while also ensuring that you're checking that all code does what you want it to do without worrying about writing all your tests up front.

IMHO, your main problem is when you have to delete some code. This is waste and this is what shall be addressed first.
Perhaps you could prototype, or utilize "spike solutions" to validate the requirements and your ideas then apply TDD on the real code, once the requirements are stable.
The risk is to apply this and to have to ship the prototype.
Also you could test-drive the "sunny path" first and only implement the remaining such as error handling ... after the requirements have been pinned down. However the second phase of the implementation will be less motivating.
What development process are you using ? It sounds agile as you're having iterations, but not in an environment that fully supports it.

TDD will, for just about anybody, slow down initial development. So, if initial development speed is 10 on a 1-10 scale, with TDD you might get around an 8 if you're proficient.
It's the development after that point that gets interesting. As projects get larger, development efficiency typically drops - often to 3 on the same scale. With TDD, it's very possible to still stay in the 7-8 range.
Look up "technical debt" for a good read. As far as I'm concerned, any code without unit tests is effectively technical debt.

TDD helps you to express the intent of your code. This means that writing the test, you have to say what you expect from your code. How your expectations are fulfilled is then secondary (this is the implementation). Ask yourself the question: "What is more important, the implementation, or what the provided functionality is?" If it is the implementation, then you don't have to write the tests. If it is the functionality provided then writing the tests first will help you with this.
Another valuable thing is that by TDD, you will not implement functionality that will not be needed. You only write code that needs to satisfy the intent. This is also called YAGNI (You aint gonna need it).

There's no getting away from it - if you're measuring how long it takes to code just by how long it takes you to write classes, etc, then it'll take longer with TDD. If you're experienced it'll add about 15%, if you're new it'll take at least 60% longer if not more.
BUT, overall you'll be quicker. Why?
by writing a test first you're specifying what you want and delivering just that and nothing more - hence saving time writing unused code
without tests, you might think that the results are so obvious that what you've done is correct - when it isn't. Tests demonstrate that what you've done is correct.
you will get faster feedback from automated tests than by doing manual testing
with manual testing the time taken to test everything as your application grows increases rapidly - which means you'll stop doing it
with manual tests it's easy to make mistakes and 'see' something passing when it isn't, this is especially true if you're running them again and again and again
(good) unit tests give you a second client to your code which often highlights design problems that you might miss otherwise
Add all this up and if you measure from inception to delivery and TDD is much, much faster - you get fewer defects, you're taking fewer risks, you progress at a steady rate (which makes estimation easier) and the list goes on.
TDD will make you faster, no question, but it isn't easy and you should allow yourself some space to learn and not get disheartened if initially it seems slower.
Finally you should look at some techniques from BDD to enhance what you're doing with TDD. Begin with the feature you want to implement and drive down into the code from there by pulling out stories and then scenarios. Concentrate on implementing your solution scenario by scenario in thin vertical slices. Doing this will help clarify the requirements.

Using TDD could actually make you write code faster - not being able to write a test for a specific scenario could mean that there is an issue in the requirements.
When you TDD you should find these problematic places faster instead of after writing 80% of your code.
There are a few things you can do to make your tests more resistant to change:
You should try to reuse code inside
your tests in a form of factory
methods that creates your test
objects along with verify methods
that checks the test result. This
way if some major behavior change
occurs in your code you have less
code to change in your test.
Use IoC container instead of passing
arguments to your main classes -
again if the method signature
changes you do not need to change
all of your tests.
Make your unit tests short and Isolated - each test should check only one aspect of your code and use Mocking/Isolation framework to make the test independent of external objects.
Test and write code for only the required feature (YAGNI). Try to ask yourself what value my customer will receive from the code I'm writing. Don't create overcomplicated architecture instead create the needed functionality piece by piece while refactoring your code as you go.

Here's a blog post I found potent in explaining the use of TDD on a very iterative design process scale: http://blog.extracheese.org/2009/11/how_i_started_tdd.html.

Joshua Block commented on something similar in the book "Coders at work". His advice was to write examples of how the API would be used (about a page in length). Then think about the examples and the API a lot and refactor the API. Then write the specification and the unit tests. Be prepared, however, to refactor the API and rewrite the spec as you implement the API.

When I deal with unclear requirements, I know that my code will need to change. Having solid tests helps me feel more comfortable changing my code. Practising TDD helps me write solid tests, and so that's why I do it.
Although TDD is primarily a design technique, it has one great benefit in your situation: it encourages the programmer to consider details and concrete scenarios. When I do this, I notice that I find gaps or misunderstandings or lack of clarity in requirements quite quickly. The act of trying to write tests forces me to deal with the lack of clarity in the requirements, rather than trying to sweep those difficulties under the rug.
So when I have unclear requirements, I practise TDD both because it helps me identify the specific requirements issues that I need to address, but also because it encourages me to write code that I find easier to change as I understand more about what I need to build.

In this early prototype-phase I find it to be enough to write testable code. That is, when you write your code, think of how to make it possible to test, but for now, focus on the code itself and not the tests.
You should have the tests in place when you commit something though.

Practical refactoring using unit tests

Having just read the first four chapters of Refactoring: Improving the Design of Existing Code, I embarked on my first refactoring and almost immediately came to a roadblock. It stems from the requirement that before you begin refactoring, you should put unit tests around the legacy code. That allows you to be sure your refactoring didn't change what the original code did (only how it did it).
So my first question is this: how do I unit-test a method in legacy code? How can I put a unit test around a 500 line (if I'm lucky) method that doesn't do just one task? It seems to me that I would have to refactor my legacy code just to make it unit-testable.
Does anyone have any experience refactoring using unit tests? And, if so, do you have any practical examples you can share with me?
My second question is somewhat hard to explain. Here's an example: I want to refactor a legacy method that populates an object from a database record. Wouldn't I have to write a unit test that compares an object retrieved using the old method, with an object retrieved using my refactored method? Otherwise, how would I know that my refactored method produces the same results as the old method? If that is true, then how long do I leave the old deprecated method in the source code? Do I just whack it after I test a few different records? Or, do I need to keep it around for a while in case I encounter a bug in my refactored code?
Lastly, since a couple people have asked...the legacy code was originally written in VB6 and then ported to VB.NET with minimal architecture changes.

For instructions on how to refactor legacy code, you might want to read the book Working Effectively with Legacy Code. There's also a short PDF version available here.

Good example of theory meeting reality. Unit tests are meant to test a single operation and many pattern purists insist on Single Responsibilty, so we have lovely clean code and tests to go with it. However, in the real (messy) world, code (especially legacy code) does lots of things and has no tests. What this needs is dose of refactoring to clean the mess.
My approach is to build tests, using the Unit Test tools, that test lots of things in a single test. In one test, I may be checking the DB connection is open, changing lots of data, and doing a before/after check on the DB. I inevitably find myself writing helper classes to do the checking, and more often than not those helpers can then be added into the code base, as they have encapsulated emergent behaviour/logic/requirements. I don't mean I have a single huge test, what I do mean is mnay tests are doing work which a purist would call an integration test - does such a thing still exist? Also I've found it useful to create a test template and then create many tests from that, to check boundary conditions, complex processing etc.
BTW which language environment are we talking about? Some languages lend themselves to refactoring better than others.

From my experience, I'd write tests not for particular methods in the legacy code, but for the overall functionality it provides. These might or might not map closely to existing methods.

Write tests at what ever level of the system you can (if you can), if that means running a database etc then so be it. You will need to write a lot more code to assert what the code is currently doing as a 500 line+ method is going to possibly have a lot of behaviour wrapped up in it. As for comparing the old versus the new, if you write the tests against the old code, they pass and they cover everything it does then when you run them against the new code you are effectively checking the old against the new.
I did this to test a complex sql trigger I wanted to refactor, it was a pain and took time but a month later when we found another issue in that area it was worth having the tests there to rely on.

In my experience this is the reality when working on Legacy code. Book (Working with Legacy..) mentioned by Esko is an excellent work which describes various approaches which can take you there.
I have seen similar issues with out unit-test itself which has grown to become system/functional test. Most important thing to develop tests for Legacy or existing code is to define the term "unit". It can be even functional unit like "reading from database" etc. Identify key functional units and maintain tests which adds value.
As an aside, there was recent talk between Joel S. and Martin F. on TDD/unit-tests. My take is that it is important to define unit and keep focus on it! URLS: Open Letter, Joel's transcript and podcast

That really is one of the key problems of trying to refit legacy code. Are you able to break the problem domain down to something more granular? Does that 500+ line method make anything other than system calls to JDK/Win32/.NET Framework JARs/DLLs/assemblies? I.e. Are there more granular function calls within that 500+ line behemoth that you could unit test?

The following book: The Art of Unit Testing contains a couple of chapters with some interesting ideas on how to deal with legacy code in terms of developing Unit Tests.
I found it quite helpful.

Should unit tests be written before the code is written?

I know that one of the defining principles of Test driven development is that you write your Unit tests first and then write code to pass those unit tests, but is it necessary to do it this way?
I've found that I often don't know what I am testing until I've written it, mainly because the past couple of projects I've worked on have more evolved from a proof of concept rather than been designed.
I've tried to write my unit tests before and it can be useful, but it doesn't seem natural to me.

Some good comments here, but I think that one thing is getting ignored.
writing tests first drives your design. This is an important step. If you write the tests "at the same time" or "soon after" you might be missing some design benefits of doing TDD in micro steps.
It feels really cheesy at first, but it's amazing to watch things unfold before your eyes into a design that you didn't think of originally. I've seen it happen.
TDD is hard, and it's not for everybody. But if you already embrace unit testing, then try it out for a month and see what it does to your design and productivity.
You spend less time in the debugger and more time thinking about outside-in design. Those are two gigantic pluses in my book.

There have been studies that show that unit tests written after the code has been written are better tests. The caveat though is that people don't tend to write them after the event. So TDD is a good compromise as at least the tests get written.
So if you write tests after you have written code, good for you, I'd suggest you stick at it.
I tend to find that I do a mixture. The more I understand the requirements, the more tests I can write up front. When the requirements - or my understanding of the problem - are weak, I tend to write tests afterwards.

TDD is not about the tests, but how the tests drive your code.
So basically you are writing tests to let an architecture evolve naturally (and don't forget to refactor !!! otherwise you won't get much benefit out of it).
That you have an arsenal of regression tests and executable documentation afterwards is a nice sideeffect, but not the main reason behind TDD.
So my vote is:
Test first
PS: And no, that doesn't mean that you don't have to plan your architecture before, but that you might rethink it if the tests tell you to do so !!!!

I've lead development teams for the past 6-7 years. What I can tell for sure is that as a developer and the developers I have worked with, it makes a phenomenal difference in the quality of the code if we know where our code fits into the big picture.
Test Driven Development (TDD) helps us answer "What?" before we answer "How?" and it makes a big difference.
I understand why there may be apprehensions about not following it in PoC type of development/architect work. And you are right it may not make a complete sense to follow this process. At the same time, I would like to emphasize that TDD is a process that falls in the Development Phase (I know it sounds obsolete, but you get the point :) when the low level specification are clear.

I think writing the test first helps define what the code should actually do. Too many times people don't have a good definition of what the code is supposed to do or how it should work. They simply start writing and make it up as they go along. Creating the test first makes you focus on what the code will do.

Not always, but I find that it really does help when I do.

I tend to write them as I write my code. At most I will write the tests for if the class/module exists before I write it.
I don't plan far enough ahead in that much detail to write a test earlier than the code it is going to test.
I don't know if this is a flaw in my thinking or method's or just TIMTOWTDI.

I start with how I would like to call my "unit" and make it compile.
like:
picker = Pick.new
item=picker.pick('a')
assert item
then I create
class Pick
def pick(something)
return nil
end
end
then I keep on using the Pick in my "test" case so I could see how I would like it to be called and how I would treat different kinds of behavior. Whenever I realize I could have trouble on some boundaries or some kind of error/exception I try to get it to fire and get an new test case.
So, in short. Yes.
The ratio doing test before is a lot higher than not doing it.

Directives are suggestion on how you could do things to improve the overall quality or productivity or even both of the end product. They are in no ways laws to be obeyed less you get smitten in a flash by the god of proper coding practice.
Here's my compromise on the take and I found it quite useful and productive.
Usually the hardest part to get right are the requirements and right behind it the usability of your class, API, package... Then is the actual implementation.
Write your interfaces (they will change, but will go a long way in knowing WHAT has to be done)
Write a simple program to use the interfaces (them stupid main). This goes a long way in determining the HOW it is going to be used (go back to 1 as often as needed)
Write tests on the interface (The bit I integrated from TDD, again go back to 1 as often as needed)
write the actual code behind the interfaces
write tests on the classes and the actual implementation, use a coverage tool to make sure you do not forget weid execution paths
So, yes I write tests before coding but never before I figured out what needs to be done with a certain level of details. These are usually high level tests and only treat the whole as a black box. Usually will remain as integration tests and will not change much once the interfaces have stabilized.
Then I write a bunch of tests (unit tests) on the implementation behind it, these will be much more detailed and will change often as the implementation evolves, as it get's optimized and expanded.
Is this strictly speaking TDD ? Extreme ? Agile...? whatever... ? I don't know, and frankly I don't care. It works for me. I adjust it as needs go and as my understanding of software development practice evolve.
my 2 cent

I've been programming for 20 years, and I've virtually never written a line of code that I didn't run some kind of unit test on--Honestly I know people do it all the time, but how someone can ship a line of code that hasn't had some kind of test run on it is beyond me.
Often if there is no test framework in place I just write a main() into each class I write. It adds a little cruft to your app, but someone can always delete it (or comment it out) if they want I guess. I really wish there was just a test() method in your class that would automatically compile out for release builds--I love my test method being in the same file as my code...
So I've done both Test Driven Development and Tested development. I can tell you that TDD can really help when you are a starting programmer. It helps you learn to view your code "From outside" which is one of the most important lessons a programmer can learn.
TDD also helps you get going when you are stuck. You can just write some very small piece that you know your code has to do, then run it and fix it--it gets addictive.
On the other hand, when you are adding to existing code and know pretty much exactly what you want, it's a toss-up. Your "Other code" often tests your new code in place. You still need to be sure you test each path, but you get a good coverage just by running the tests from the front-end (except for dynamic languages--for those you really should have unit tests for everything no matter what).
By the way, when I was on a fairly large Ruby/Rails project we had a very high % of test coverage. We refactored a major, central model class into two classes. It would have taken us two days, but with all the tests we had to refactor it ended up closer to two weeks. Tests are NOT completely free.

I'm not sure, but from your description I sense that there might be a misunderstanding on what test-first actually means. It does not mean that you write all your tests first. It does mean that you have a very tight cycle of
write a single, minimal test
make the test pass by writing the minimal production code necessary
write the next test that will fail
make all the existing tests pass by changing the existing production code in the simplest possible way
refactor the code (both test and production!) so that it doesn't contain duplication and is expressive
continue with 3. until you can't think of another sensible test
One cycle (3-5) typically just takes a couple of minutes. Using this technique, you actually evolve the design while you write your tests and production code in parallel. There is not much up front design involved at all.
On the question of it being "necessary" - no, it obviously isn't. There have been uncountable projects successfull without doing TDD. But there is some strong evidence out there that using TDD typically leads to significantly higher quality, often without negative impact on productivity. And it's fun, too!
Oh, and regarding it not feeling "natural", it's just a matter of what you are used to. I know people who are quite addicted to getting a green bar (the typical xUnit sign for "all tests passing") every couple of minutes.

There are so many answers now and they are all different. This perfectly resembles the reality out there. Everyone is doing it differently. I think there is a huge misunderstanding about unit testing. It seems to me as if people heard about TDD and they said it's good. Then they started to write unit tests without really understanding what TDD really is. They just got the part "oh yeah we have to write tests" and they agree with it. They also heard about this "you should write your tests first" but they do not take this serious.
I think it's because they do not understand the benefits of test-first which in turn you can only understand once you've done it this way for some time. And they always seem to find 1.000.000 excuses why they don't like writing the tests first. Because it's too difficult when figuring out how everything will fit together etc. etc. In my opinion, it's all excuses for them to hide away from their inability to once discipline themselve, try the test-first approach and start to see the benefits.
The most ridicoulous thing if they start to argue "I'm not conviced about this test-first thing but I've never done it this way" ... great ...
I wonder where unit testing originally comes from. Because if the concept really originates from TDD then it's just ridicoulous how people get it wrong.

Writing the tests first defines how your code will look like - i.e. it tends to make your code more modular and testable, so you do not create a "bloat" methods with very complex and overlapping functionality. This also helps to isolate all core functionality in separate methods for easier testing.

Personally, I believe unit tests lose a lot of their effectiveness if not done before writing the code.
The age old problem with testing is that no matter how hard we think about it, we will never come up with every possibly scenario to write a test to cover.
Obviously unit testing itself doesn't prevent this completely, as it restrictive testing, looking at only one unit of code not covering the interactions between this code and everything else, but it provides a good basis for writing clean code in the first place that should at least restrict the chances for issues of interaction between modules. I've always worked to the principle of keeping code as simple as it possibly can be - infact I believe this is one of the key principles of TDD.
So starting off with a test that basically says you can create a class of this type and build it up, in theory, writing a test for every line of code or at least covering every route through a particular piece of code. Designing as you go! Obviously based on a rough-up-front design produced initially, to give you a framework to work to.
As you say it is very unnatural to start with and can seem like a waste of time, but I've seen myself first hand that it pays off in the long run when defects stats come through and show the modules that were fully written using TDD have far lower defects over time than others.

Before, during and after.
Before is part of the spec, the contract, the definition of the work
During is when special cases, bad data, exceptions are uncovered while implementing.
After is maintenance, evolution, change, new requirements.

I don't write the actual unit tests first, but I do make a test matrix before I start coding listing all the possible scenarios that will have to be tested. I also make a list of cases that will have to be tested when a change is made to any part of the program as part of regression testing that will cover most of the basic scenarios in the application in addition to fully testing the bit of code that changed.

Remember with Extreme programming your tests effectly are you documenation. So if you don't know what you're testing, then you don't know what you want your application is going to do?
You can start off with "Stories" which might be something like
"Users can Get list of Questions"
Then as you start writing code to solve the unit tests. To solve the above you'll need at least a User and question class. So then you can start thinking about the fields:
"User Class Has Name DOB Address TelNo Locked Fields"
etc.
Hope it helps.
Crafty

Yes, if you are using true TDD principles. Otherwise, as long as you're writing the unit-tests, you're doing better than most.
In my experience, it is usually easier to write the tests before the code, because by doing it that way you give yourself a simple debugging tool to use as you write the code.

I write them at the same time. I create the skeleton code for the new class and the test class, and then I write a test for some functionality (which then helps me to see how I want the new object to be called), and implement it in the code.
Usually, I don't end up with elegant code the first time around, it's normally quite hacky. But once all the tests are working, you can refactor away until you end up with something pretty neat, tidy and proveable to be rock solid.

It helps when you are writing something that you are used writing to write first all the thing you would regularly check for and then write those features. More times then not those features are the most important for the piece of software you are writing. Now , on the other side there are not silver bullets and thing should never be followed to the letter. Developer judgment plays a big role in the decision of using test driven development versus test latter development.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js