Is duplicated code more tolerable in unit tests? - unit-testing

I ruined several unit tests some time ago when I went through and refactored them to make them more DRY--the intent of each test was no longer clear. It seems there is a trade-off between tests' readability and maintainability. If I leave duplicated code in unit tests, they're more readable, but then if I change the SUT, I'll have to track down and change each copy of the duplicated code.
Do you agree that this trade-off exists? If so, do you prefer your tests to be readable, or maintainable?

Readability is more important for tests. If a test fails, you want the problem to be obvious. The developer shouldn't have to wade through a lot of heavily factored test code to determine exactly what failed. You don't want your test code to become so complex that you need to write unit-test-tests.
However, eliminating duplication is usually a good thing, as long as it doesn't obscure anything, and eliminating the duplication in your tests may lead to a better API. Just make sure you don't go past the point of diminishing returns.

Duplicated code is a smell in unit test code just as much as in other code. If you have duplicated code in tests, it makes it harder to refactor the implementation code because you have a disproportionate number of tests to update. Tests should help you refactor with confidence, rather than be a large burden that impedes your work on the code being tested.
If the duplication is in fixture set up, consider making more use of the setUp method or providing more (or more flexible) Creation Methods.
If the duplication is in the code manipulating the SUT, then ask yourself why multiple so-called “unit” tests are exercising the exact same functionality.
If the duplication is in the assertions, then perhaps you need some Custom Assertions. For example, if multiple tests have a string of assertions like:
assertEqual('Joe', person.getFirstName())
assertEqual('Bloggs', person.getLastName())
assertEqual(23, person.getAge())
Then perhaps you need a single assertPersonEqual method, so that you can write assertPersonEqual(Person('Joe', 'Bloggs', 23), person). (Or perhaps you simply need to overload the equality operator on Person.)
As you mention, it is important for test code to be readable. In particular, it is important that the intent of a test is clear. I find that if many tests look mostly the same, (e.g. three-quarters of the lines the same or virtually the same) it is hard to spot and recognise the significant differences without carefully reading and comparing them. So I find that refactoring to remove duplication helps readability, because every line of every test method is directly relevant to the purpose of the test. That's much more helpful for the reader than a random combination of lines that are directly relevant, and lines that are just boilerplate.
That said, sometimes tests are exercising complex situations that are similiar but still significantly different, and it is hard to find a good way to reduce the duplication. Use common sense: if you feel the tests are readable and make their intent clear, and you're comfortable with perhaps needing to update more than a theoretically minimal number of tests when refactoring the code invoked by the tests, then accept the imperfection and move on to something more productive. You can always come back and refactor the tests later, when inspiration strikes!

Implementation code and tests are different animals and factoring rules apply differently to them.
Duplicated code or structure is always a smell in implementation code. When you start having boilerplate in implementation, you need to revise your abstractions.
On the other hand, testing code must maintain a level of duplication. Duplication in test code achieves two goals:
Keeping tests decoupled. Excessive test coupling can make it hard to change a single failing test that needs updating because the contract has changed.
Keeping the tests meaningful in isolation. When a single test is failing, it must be reasonably straightforward to find out exactly what it is testing.
I tend to ignore trivial duplication in test code as long as each test method stays shorter than about 20 lines. I like when the setup-run-verify rhythm is apparent in test methods.
When duplication creeps up in the "verify" part of tests, it is often beneficial to define custom assertion methods. Of course, those methods must still test a clearly identified relation that can be made apparent in the method name: assertPegFitsInHole -> good, assertPegIsGood -> bad.
When test methods grow long and repetitive I sometimes find it useful to define fill-in-the-blanks test templates that take a few parameters. Then the actual test methods are reduced to a call to the template method with the appropriate parameters.
As for a lot of things in programming and testing, there is no clear-cut answer. You need to develop a taste, and the best way to do so is to make mistakes.

You can reduce repetition using several different flavours of test utility methods.
I'm more tolerant of repetition in test code than in production code, but I have been frustrated by it sometimes. When you change a class's design and you have to go back and tweak 10 different test methods that all do the same setup steps, it's frustrating.

I agree. The trade off exists but is different in different places.
I'm more likely to refactor duplicated code for setting up state. But less likely to refactor the part of the test that actually exercises the code. That said, if exercising the code always takes several lines of code then I might think that is a smell and refactor the actual code under test. And that will improve readability and maintainability of both the code and the tests.

Jay Fields coined the phrase that "DSLs should be DAMP, not DRY", where DAMP means descriptive and meaningful phrases. I think the same applies to tests, too. Obviously, too much duplication is bad. But removing duplication at all costs is even worse. Tests should act as intent-revealing specifications. If, for example, you specify the same feature from several different angles, then a certain amount of duplication is to be expected.

"refactored them to make them more DRY--the intent of each test was no longer clear"
It sounds like you had trouble doing the refactoring. I'm just guessing, but if it wound up less clear, doesn't that mean you still have more work to do so that you have reasonably elegant tests which are perfectly clear?
That's why tests are a subclass of UnitTest -- so you can design good test suites that are correct, easy to validate and clear.
In the olden times we had testing tools that used different programming languages. It was hard (or impossible) to design pleasant, easy-to-work with tests.
You have the full power of -- whatever language you're using -- Python, Java, C# -- so use that language well. You can achieve good-looking test code that's clear and not too redundant. There's no trade-off.

I LOVE rspec because of this:
It has 2 things to help -
shared example groups for testing common behaviour.
you can define a set of tests, then 'include' that set in your real tests.
nested contexts.
you can essentially have a 'setup' and 'teardown' method for a specific subset of your tests, not just every one in the class.
The sooner that .NET/Java/other test frameworks adopt these methods, the better (or you could use IronRuby or JRuby to write your tests, which I personally think is the better option)

I feel that test code requires a similar level of engineering that would normally be applied to production code. There can certainly be arguments made in favor of readability and I would agree that's important.
In my experience, however, I find that well-factored tests are easier to read and understand. If there's 5 tests that each look the same except for one variable that's changed and the assertion at the end, it can be very difficult to find what that single differing item is. Similarly, if it is factored so that only the variable that's changing is visible and the assertion, then it's easy to figure out what the test is doing immediately.
Finding the right level of abstraction when testing can be difficult and I feel it is worth doing.

I don't think there is a relation between more duplicated and readable code. I think your test code should be as good as your other code. Non-repeating code is more readable then duplicated code when done well.

Ideally, unit tests shouldn't change much once they are written so I would lean towards readability.
Having unit tests be as discrete as possible also helps to keep the tests focused on the specific functionality that they are targeting.
With that said, I do tend to try and reuse certain pieces of code that I wind up using over and over, such as setup code that is exactly the same across a set of tests.

Related

Goal of unit testing and TDD: find/minimize bugs or improve design?

I'm fairly green to unit testing and TDD, so please bear with me as I ask what some may consider newbie questions, or if this has been debated before. If this turns out to be considered a "bad question" (too subjective and open for debate), I will happily close it. However, I've searched for a couple days, and am not getting a definitive answer, and I need a better understand of this, so I know no better way to get more info than to post here.
I've started reading an older book on unit testing (because a colleague had it on hand), and its opening chapter talks about why to unit test. One of the points it makes is that in the long run, your code is much more reliable and cleaner, and less prone to bugs. It also points out that effective unit testing will make tracking and fixing bugs much easier. So it seems to focus quite a bit on the overall prevention/reduction of bugs in your code.
On the other hand, I also found an article about writing great unit tests, and it states that the goal of unit testing is to make your design more robust, and conversely, finding bugs is the goal of manual testing, not unit testing.
So being the newbie to TDD that I am, I'm a little confused as to the state of mind with which I should go into TDD and building my unit tests. I'll admit that part of the reason I'm taking this on now with my recently started project is because I'm tired of my changes breaking previously existing code. And admittedly, the linked article above does at least point this out as an advantage to TDD. But my hope is that by going back in and adding unit tests to my existing code (and then continuing TDD from this point forward) is to help prevent these bugs in the first place.
Are this book and this article really saying the same thing in different tones, or is there some subjectivity on this subject, and what I'm seeing is just two people having somewhat different views on how to approach TDD?
Thanks in advance.
Unit tests and automated tests generally are for both better design and verified code.
Unit test should test some execution path in some very small unit. This unit is usually public method or internal method exposed on your object. The method itself can still use many other protected or private methods from the same object instance. You can have single method and several unit test for this method to test different execution paths. (By execution path I meant something controlled by if, switch, etc.) Writing unit tests this way will validate that your code really does what you expect. This can be especially important in some corner cases where you expect to throw exception in some rare scenarios etc. You can also test how method behaves if you pass different parameters - for example null instead of object instance, negative value for integer used for indexing, etc. That is especially useful for public API.
Now suppose that your tested method also uses instances of other classes. How to deal with it? Should you still test your single method and believe that class works? What if the class is not implemented yet? What if the class has some complex logic inside? Should you test these execution paths as well on your current method? There are two approaches to deal with this:
For some cases you will simply let the real class instance to be tested together with your method. This is for example very common in case of logging (it is not bad to have logs available for test as well).
For other scenarios you would like to take this dependencies from your method but how to do it? The solution is dependency injection and implementing against abstraction instead of implementation. What does it mean? It means that your method / class will not create instances of these dependencies but instead it will get them either through method parameters, class constructor or class properties. It also means that you will not expect concrete implementation but either abstract base class or interface. This will allow you to pass fake, dummy or mock implementation to your tested object. These special type of implementations simply don't do any processing they get some data and return expected result. This will allow you to test your method without dependencies and lead to much better and more extensible design.
What is the disadvantage? Once you start using fakes / mocks you are testing single method / class but you don't have a test which will grab all real implementations and put them together to test if the whole system really works = You can have thousands of unit tests and validate that each your method works but it doesn't mean they will work together. This is scenario for more complex tests - integration or end-to-end tests.
Unit tests should be usually very easy to write - if they are not it means that your design is probably complicated and you should think about refactoring. They should be also very fast to execute so you can run them very often. Other kinds of test can be more complex and very slow and they should run mostly on build server.
How it fits with SW development process? The worst part of development process is stabilization and bug fixing because this part can be very hardly estimated. To be able to estimate how much time bug fixing takes you must know what causes the bug. But this investigation cannot be estimated. You can have bug which will take one hour to fix but you will spend two weeks by debugging your application and searching for this bug. When using good code coverage you will most probably find such bug early during development.
Automated testing don't say that SW doesn't contain bugs. It only say that you did your best to find and solve them during development and because of that your stabilization could be much less painful and much shorter. It also doesn't say that your SW does what it should - that is more about application logic itself which must be tested by some separate tests going through each use case / user story - acceptance tests (they can be also automated).
How this fit with TDD? TDD takes it to extreme because in TDD you will write your test first to drive your quality, code coverage and design.
It's a false choice. "Find/minimize bugs" OR improve design.
TDD, in particular (and as opposed to "just" unit testing) is all about giving you better design.
And when your design is better, what are the consequences?
Your code is easier to read
Your code is easier to understand
Your code is easier to test
Your code is easier to reuse
Your code is easier to debug
Your code has fewer bugs in the first place
With well-designed code, you spend less time finding and fixing bugs, and more time adding features and polish. So TDD gives you a savings on bugs and bug-hunting, by giving you better design. These things are not separate; they are dependent and interrelated.
There can many different reasons why you might want to test your code. Personally, I test for a number of reasons:
I usually design API using a combination of the normal design patterns (top-down) and test-driven development (TDD; bottom-up) to ensure that I have a sound API both from a best practices point-of-view as well as from an actual usage point-of-view. The focus of the tests is both on the major use-cases for the API, but also on the completeness of the API and the behavior - so they are primary "black box" tests. The development sequence is often:
main API based on design patterns and "gut feeling"
TDD tests for the major use-cases according to the high-level specification for the API - primary in order to make sure the API is "natural" and easy to use
fleshed out API and behavior
all the needed test cases to ensure the completeness and correct behavior
Whenever I fix an error in my code, I try to write a test to make sure it stay fixed. Somehow, the error got into my original design and passed my original testing of the code, so it is probably not all that trivial. I have noticed that many of the tests tests are "write box" tests.
In order to be able to make any sort of major re-factoring of the code, you need an extensive set of API tests to make sure the behavior of the code stays the same after the re-factoring. For any non-trivial API, I want the test suite to be in place and working for a long time before the re-factoring to be sure that all the major use-cases are covered in a good way. As often as not, you are forced to throw away most of your "white box" tests as they - by the very definition - makes too many assumptions about the internals. I usually try to "translate" as many as possible of these tests as the same non-trivial problems tend to survive re-factoring of the code.
In order to transfer any code between developers, I usually also want a good test suite with focus on the API and the major use-cases. So basically the tests from the initial TDD...
I think that answer to your question is: both.
You will improve design because there is one particular thing about TDD that is great: while you write tests you put yourself in the position of the client code that will be using the system under test - and this alone makes you think about certain design choices.
For example: UI. When you start writing the tests, you will see that those God-Forms are impossible to test, so you separate the logic behind the screens to a presenter/controller, and you get MVP/MVC/whatever.
Having the concept of unit testing a class and mocking dependencies brings you to Single Responsibility Principle. There is a point about every of SOLID principles.
As for bugs, well, if you unit test every method of every class you write (except properties, very simple methods and such) you will catch most bugs in the start. Write the integration tests, you cover almost all of them.
I'll take my stab at this using a remix of a previous answer I wrote. In short, I don't see this as a dichotomy between driving good design and minimizing bugs. I see it more as one (good design) leading to the other (minimizing bugs).
I tend towards saying TDD is a design process that happens to involve unit testing. It's a design process because within each Red-Green-Refactor iteration, you write the test first for code that doesn't exist. You're designing as you're going.
The first beauty of TDD is that the design of your code is guaranteed to be testable. Testable code tends to have loose coupling and high cohesion. Loose coupling and high cohesion are important because they make the code easy to change when requirements change. The second beauty of TDD is that after you're done implementing your system, you happen to have a huge regression suite to catch any bugs and changes in assumptions. Thus, TDD makes your code easy to change because of the design it creates and it makes your code safe to change because of the test harness it creates.
Trying to retrospectively add Unit tests can be quite painful and expensive. If the code doesn't support Unit test you may be better looking at integration tests to test your code.
Don't mix Unit Testing with TDD.
Unit Testing is just the fact of "testing" your code to ensure quality and maintainability.
TDD is a full blown development methodology in which you first write your tests (based on requirements), and only then you write the needed code (and just the needed code) to make that test pass. This means that you only write code to repair a broken test.
Once done that, you write another test, and the code needed to make it pass. In the way, you may be forced to do "refactoring" of the code to allow a new test run without braking another. This way, the "design" arises from the tests.
The purpose of this methodology is of course reduce bugs and improve design, but the main goal of it is to improve productivity because you write exactly the code you need. And you don't write documentation: the tests are the documentation. If a requirement changes, then you change the tests and the code afterwards. If new requirements appear, just add new tests.

What does “DAMP not DRY” mean when talking about unit tests?

I heard someone say that unit tests (e.g. nUnit, jUnit, xUnit) should be
DAMP not DRY
(E.g. unit tests should contain "damp code" not "dry code")
What are they talking about?
It's a balance, not a contradiction
DAMP and DRY are not contradictory, rather they balance two different aspects of a code's maintainability. Maintainable code (code that is easy to change) is the ultimate goal here.
DAMP (Descriptive And Meaningful Phrases) promotes the readability of the code.
To maintain code, you first need to understand the code. To understand it, you have to read it. Consider for a moment how much time you spend reading code. It's a lot.
DAMP increases maintainability by reducing the time necessary to read and understand the code.
DRY (Don't repeat yourself) promotes the orthogonality of the code.
Removing duplication ensures that every concept in the system has a single authoritative representation in the code. A change to a single business concept results in a single change to the code. DRY increases maintainability by isolating change (risk) to only those parts of the system that must change.
So, why is duplication more acceptable in tests?
Tests often contain inherent duplication because they are testing the same thing over and over again, only with slightly different input values or setup code. However, unlike production code, this duplication is usually isolated only to the scenarios within a single test fixture/file. Because of this, the duplication is minimal and obvious, which means it poses less risk to the project than other types of duplication.
Furthermore, removing this kind of duplication reduces the readability of the tests. The details that were previously duplicated in each test are now hidden away in some new method or class. To get the full picture of the test, you now have to mentally put all these pieces back together.
Therefore, since test code duplication often carries less risk, and promotes readability, its easy to see how it is considered acceptable.
As a principle, favor DRY in production code, favor DAMP in test code. While both are equally important, with a little wisdom you can tip the balance in your favor.
DAMP - Descriptive And Meaningful Phrases.
"DAMP not DRY" values readability over code re-use. The idea of DAMP not DRY in test cases is that tests should be easy to understand, even if that means test cases sometimes have repeated code.
See also Is duplicated code more tolerable in unit tests? for some discussion on the merits of this viewpoint.
It may have been coined by Jay Fields, in relation to Domain Specific Languages.
"DRY" is "Don't repeat yourself"
This is a term which is used to tell people to write code that is reusable, so that you don't end up writing similar code over and over again.
"DAMP" is "Descriptive And Meaningful Phrases".
This term is intended to tell you to write code which can easily be understood by someone who is looking at it. If you are following this principle, you will have long and descriptive variable and function names, etc.
Damp = 'Descriptive And Meaningful Phrases' - your unit tests should be able to be 'read':
Readability is more important than
avoiding redundant code.
From the article:
DAMP stands for “descriptive and meaningful phrases” and is the opposite of DRY, not in the sense that it says “everything should look like a trash heap and be impossible to read”, in that readability is more important than avoiding redundant code.
What does this mean and where to use it?
DAMP mostly applies when writing test code. Test code should be very easy to understand to the point that some redundancy is acceptable.
There are several answers here already, but I wanted to add another as I didn't think they necessarily explained it as well as they could.
The idea of DRY (Don't repeat yourself) is that in your application code you want to avoid redundant or reptetive code. If you've got something that your code needs to do multiple times you should have a function or class for it, rather than repeating similar code in several places.
This is a fairly well known programming concept.
DAMP (Descriptive and Meaninful Phrases) is for your unit tests. The idea here is that your unit test method names should be long and descriptive -- effectively short sentences that describe what you're testing.
eg: testWhenIAddOneAndOneIShouldGetTwo() { .... }
When you read a DAMP method name like this, you should understand exactly what the test writer was trying to acheive, without even having to read the test code (although the test code can also follow this concept as well of course with wordy variable names, etc).
This is possible because a unit test method has very specific input and expected output, so the DAMP principle works well for them. Methods in your main application code are unlikely to be specific enough to warrant names like this, especially if you've written it with the DRY principle in mind.
DAMP and DRY do not contradict each other -- they cover different aspects of how your code is written -- but nonetheless they aren't typically used together because methods written with the DRY principle in mind would be general-purpose and unlikely to be suited to highly specific method name. In general therefore, as explained above, your application code should be DRY and your unit test code DAMP.
I hope that helps explain it a bit better.
DAMP stands for “descriptive and meaningful phrases” and is the opposite of DRY, not in the sense that it says “everything should look like a trash heap and be impossible to read”, in that readability is more important than avoiding redundant code.
http://codeshelter.wordpress.com/2011/04/07/dry-and-damp-principles-when-developing-and-unit-testing/
I agree with Chris Edwards in that you need to strike a balance between the two. Another thing to note is that if, in an attempt to remove duplication, you end up adding a lot of additional structure in your unit test code (i.e. when taking DRY to extremes), you run the risk of introducing bugs in there. In such a situation, you would either have to unit test your unit tests or leave bits of structure untested.
I don't wish to duplicate the effort here, but you can have tests that are DAMP but have the benefit of DRY. On the flip side, DRY tests won't satisfy DAMP tests in some cases.
I've blogged about DRY vs DAMP which includes some examples.
Neither approach should be your only solution, sometimes DAMP is overkill, other times a very nice addition.
As a general rule you should apply the rule of three. If you spot duplication a third time, it may be worth looking into writing DAMP style tests, but even then not all duplication is bad. Context matters.

When doing TDD, why should I do "just enough" to get a test passing?

Looking at posts like this and others, it seems that the correct way to do TDD is to write a test for a feature, get just that feature to pass, and then add another test and refactor as necessary until it passes, then repeat.
My question is: why is this approach used? I completely understand the write tests first idea, because it helps your design. But why wouldn't I create all tests for a specific function, and then implement that function all at once until all tests pass?
The approach comes from the Extreme Programming principal of You Aren't Going to Need It. If you actually write a single test and then the code that makes it pass then repeating that process you usually find that you write just enough to get things working. You don't invent new features that are not needed. You don't handle corner cases that don't exist.
Try an experiment. Write out the list of tests you think you need. Set it aside. Then go with the one test at a time approach. See if the lists differ and why. When I do that I almost always end up with fewer tests. I almost always find that I invented a case that I didn't need if I do it the all the tests first way.
For me, it is about "thought burden." If I have all of the possible behaviors to worry about at once, my brain is strained. If I approach them one at a time, I can give full attention to solving the immediate problem.
I believe this derives from the principle of "YAGNI" ("You're Ain't Gonna Need It")(*), which states that classes should be as simple as necessary, with no extra features. Hence when you need a feature, you write a test for it, then you write the feature, then you stop. If you wrote a number of tests first, clearly you would be merely speculating on what your API would need to be at some point in the future.
(*) I generally translate that as "You are too stupid to know what will be needed in the future", but that's another topic......
imho it reduces the chance of over engineering the piece of code you are writing.
Its just easier to add unnecessary code when you are looking at different usage scenarios.
Dan North has suggested that there is no such thing as test-driven design because the design is not really driven out by testing -- that these unit tests only become tests once functionality is implemented, but during the design phase you are really designing by example.
This makes sense -- your tests are setting up a range of sample data and conditions with which the system under test is going to operate, and you drive out design based on these example scenarios.
Some of the other answers suggest that this is based on YAGNI. This is partly true.
Beyond that, though, there is the issue of complexity. As is often stated, programming is about managing complexity -- breaking things down into comprehensible units.
If you write 10 tests to cover cases where param1 is null, param2 is null, string1 is empty, int1 is negative, and the current day of the week is a weekend, and then go to implement that, you are having to juggle a lot of complexity at once. This opens up space to introduce bugs, and it becomes very difficult to sort out why tests are failing.
On the other hand, if you write the first test to cover an empty string1, you barely have to think about the implementation. Once the test is passing, you move on to a case where the current day is a weekend. You look at the existing code and it becomes obvious where the logic should go. You run tests and if the first test is now failing, you know that you broke it while implementing the day-of-the-week thing. I'd even recommend that you commit source between tests so that if you break something you can always revert to a passing state and try again.
Doing just a little at a time and then verifying that it works dramatically reduces the space for the introduction of defects, and when your tests fail after implementation you have changed so little code that it is very easy to identify the defect and correct it, because you know that the existing code was already working properly.
This is a great question. You need to find a balance between writing all tests in the universe of possible tests, and the most likely user scenarios. One test is, IMHO, not enough, and I typically like to write 3 or 4 tests which represent the most common uses of the feature. I also like to write a best case test and a worst case test as well.
Writing many tests helps you to anticipate and understand the potential use of your feature.
I believe TDD advocates writing one test at a time because it forces you to think in terms of the principle of doing the simplest thing that could possibly work at each step of development.
I think the article you sent is exactly the answer. If you write all the tests first and all of the scenarios first, you will probably write your code to handle all of those scenarios at once and most of the time you probably end up with code that is fairly complex to handle all of these.
On the other hand, if you go one at a time, you will end up refactoring your existing code each time to end up with code probably as simple as it can be for all the scenarios.
Like in the case of the link you gave in your question, had they written all the tests first, I am pretty sure they would have not ended up with a simple if/else statement, but probably a fairly complex recursive piece of code.
The reason behind the principle is simple. How practical it is to stick to is a separate question.
The reason is that if you are writing more code that what is needed to pass the current test you are writing code that is, by definition, untested. (It's nothing to do with YAGNI.)
If you write the next test to "catch up" with the production code then you've just written a test that you haven't seen fail. The test may be called "TestNextFeature" but it may as well return true for all the evidence you have on it.
TDD is all about making sure that all code - production and tests - is tested and that all those pesky "but I'm sure I wrote it right" bugs don't get into the code.
I would do as you suggest. Write several tests for a specific function, implement the function, and ensure that all of the tests for this function pass. This ensures that you understand the purpose and usage of the function separately from your implementation of it.
If you need to do a lot more implementation wise than what is tested by your unit tests, then your unit tests are likely not comprehensive enough.
I think part of that idea is to keep simplicity, keep to designed/planned features, and make sure that your tests are sufficient.
Lots of good answers above - YAGNI is the first answer that jumps to mind.
The other important thing about the 'just get the test passing' guideline though, is that TDD is actually a three stage process:
Red > Green > Refactor
Frequently revisiting the final part, the refactoring, is where a lot of the value of TDD is delivered in terms of cleaner code, better API design, and more confidence in the software. You need to refactor in really small short blocks though lest the task become too big.
It is hard to get into this habit, but stick with it, as it's an oddly satisfying way to work once you get into the cycle.

unit tests in C++

Is okay to break all dependencies using interfaces just to make a class testable? It involves significant overhead at runtime because of many virtual calls instead of plain method invocations.
How does test driven development work in real world C++ applications? I read Working Effectively With Legacy Code and fond it quite useful but do not get up to speed practising TDD.
If I do a refactoring it occurs very often that I have to completely rewite unit test because of massive logic changes. My code changes very often change the fundamental logic of the processing of data. I do not see a way to write unit tests which do not have to change in a large refactoring.
May be someone can point me to an open source c++ application wich is using TDD to learn by example.
Update: See this question too.
I can only answer some parts here:
Is okay to break all dependencies using interfaces just to make a class testable? It involves significant overhead at runtime because of many virtual calls instead of plain method invocations.
If your performance will suffer too much because of it, no (benchmark!). If your development suffers too much, no (estimate extra effort). If it seems like it won't matter much and help in the long run and helps you with quality, yes.
You could always 'friend' your test classes, or a TestAccessor object through which your tests could investigate stuff within it. That avoids making everything dynamically dispatchable just for testing. (it does sound like quite a bit of work.)
Designing testable interfaces isn't easy. Sometimes you have to add some extra methods that access the innards just for testing. It makes you cringe a bit but it's good to have and more often than not those functions are useful in the actual application as well, sooner or later.
If I do a refactoring it occurs very often that I have to completely rewite unit test because of massive logic changes. My code changes very often change the fundamental logic of the processing of data. I do not see a way to write unit tests which do not have to change in a large refactoring.
Large refactorings by defintion change a lot, including the tests. Be happy you have them as they will test stuff after refactoring too.
If you spend more time refactoring than making new features, perhaps you should consider thinking a bit more before coding to find better interfaces that can withstand more changes. Also, writing unit-tests before interfaces are stable is a pain, no matter what you do.
The more code you have against an interface that changes much, the more code you will have to change each time. I think your problem lies there. I've managed to have sufficiently stable interfaces in most places, and refactor only parts now and then.
Hope it helps.
I've routinely used macros, #if and other preprocessor tricks to "mock-out" dependencies for the purpose of unit testing in C and C++, exactly because with such macros I don't have to pay any run-time cost when the code is compiled for production rather than testing. Not elegant, but reasonably effective.
As for refactorings, they may well require changing the tests when they are so overwhelmingly large and intrustive as you describe. I don't find myself refactoring so drastically all that often, though.
The obvious answer would be to factor out dependencies using templates rather than interfaces. Of course that might hurt compile-times (depending on exactly how you implement it), but it should eliminate the runtime overhead at least. A slightly simpler solution might be to just rely on a set of typedefs, which can be swapped out with a few macros or similar.
As to your first question - it's rarely worthwhile to break things just for the sake of testing, although sometimes you might have to break things before making them better as part of your refactoring. The most important criteria for a software product is that it works, not that it's testable. Testable is only important as it helps you make a product that is more stable and functions better for your end-users.
A big part of test-driven development is selecting small, atomic parts of your code that aren't likely to change for unit testing. If you're having to rewrite a lot of unit tests because of massive logic changes, you might need to test at a finer-grained level, or re-design your code so that it's more stable. A stable design shouldn't change drastically over time, and testing won't help you avoid massive refactoring is that becomes required. However, if done right testing can make it so that when you refactor things you can be more confident that your refactoring was successful, assuming that there are some tests that don't need to be changed.
Is okay to break all dependencies using interfaces just to make a class testable? It involves significant overhead at runtime because of many virtual calls instead of plain method invocations.
I think it is OK to break the dependencies, since that will lead into better interfaces.
If I do a refactoring it occurs very often that I have to completely rewite unit test because of massive logic changes. My code changes very often change the fundamental logic of the processing of data. I do not see a way to write unit tests which do not have to change in a large refactoring.
You won't get ride of these large refactorings in any language since your tests should be expressing the real intent of your code. So if the logic changes, your tests must change.
Maybe you aren't really doing TDD, like:
Create a test that fails
Create the code to pass the test
Create another test that fails
Fix the code to pass both tests
Rinse and repeat until you think you have enough tests that show what your code should be doing
These steps say that you should be doing minor changes, and not big ones. If you stay with the latter, you can't escape big refactors. No language will save you from that, and C++ will be the worst of them because of the compile times, link times, bad error messages, etc. etc.
I'm actually working in a real world software written in C++ with a HUGE legacy code under it. We're using TDD and it is really helping evolve the design of the software.
If I do a refactoring it occurs very often that I have to completely rewrite unit test because of massive logic changes. ... I do not see a way to write unit tests which do not have to change in a large refactoring.
There are multiple layers of testing, and some of those layers won't break even after big logic changes. Unit test, on the other hand, are meant to test the internals of methods and objects, and will need to change more often than that. There's nothing wrong, necessarily. It's just how things are.
Is [it] okay to break all dependencies using interfaces just to make a class testable?
It's definitely OK to design classes to be more testable. That's part of the purpose of TDD, after all.
It involves significant overhead at runtime because of many virtual calls instead of plain method invocations.
Just about every company has some list of rules that all employees are supposed to follow. The clueless companies simply list every good quality they can think of ("our employees are efficient, responsible, ethical, and never cut corners"). More intelligent companies actually rank their priorities. If somebody comes up with an unethical way to be efficient, does the company do it? The best companies not only print up brochures saying how the priorities are ranked, but they also make sure management follows the ranking.
It is entirely possible for a program to be efficient and easily testable. However there are times when you need to choose which is more important. This is one of those times. I don't know how important efficiency is to you and your program, but you do. So "would you rather have a slow, well-tested program, or a fast program without total test coverage?"
It involves significant overhead at
runtime because of many virtual calls
instead of plain method invocations.
Remember that it is only a virtual call overhead if you access the method through a pointer (or ref.) to you interface or object. If you access the method through a concrete object in the stack, it will not have the virtual overhead, and it can even be inlined.
Also, never assume that this overhead is big before profiling your code. Almost always, a virtual call is worthless if you compare to what your method is doing. (Most of the penalty comes from not being possible to inline a one-line method, not from the extra indirection of the call).

How do you refactor?

I was wondering how other developers begin refactoring. What is your first step? How this process (refactoring) differ if you refactor code which is not yours? Do you write tests while refactoring?
do not refactor anything non-trivial that does not already have unit tests
write unit tests, then refactor
refactor small pieces and re-run the tests frequently
stop refactoring when the code is DRY* clean
* DRY = Don't Repeat Yourself
What is your first step?
The first step is to run the unit tests to make sure they all pass. Indeed, you can waste a large amount of time looking for which of your changes broke the a test if it was already broken before your modify the code.
How this process differ if you refactor code which is not yours ?
I'm certainly doing smaller steps when refactoring code I didn't write (or code I wrote a long time ago). I may also verify the test coverage before to proceed, to avoid relying on unit tests that always pass ... but that do no test the area I'm working on.
Do you write tests while refactoring ?
I usually don't, but I may add new tests in the following circumstances (list not exhaustive) :
idea of a new test sparkles in my mind ("what happen if ... ?" - write
a test to know)
discover hole in the test coverage
it also depends on the refactoring being performed. When extracting a function, I may create a new test if it can be invoked a different way as it was before.
Here are some general advice:
First thing is to maintain a list of the code smells noticed while working on the code. This allows freeing one's mind from the burden of remembering what was seen in the code. Also,
The golden rule is never refactor when the the unit tests do not pass completely.
Refactor when the code is stable, before adding something you know will be impacted by a future refactoring, before to integrate and above all before to say your done.
In the absence of unit tests, you'll have to put the part of code you want to refactor under test. If unit tests are too hard to retrofit, as it usually is the case, then you can create characterization tests, as recommended per Michael Feathers in Working Effectively with Legacy Code. In short they are end-to-end tests that allow you to pin down the current behavior of the code (which is not assumed to be working perfectly all the time).
Don't be afraid to do baby steps. Do not do two things at the same time. If you remark something requiring refactoring, note it down, do not fix it right now, even if it seem very easy.
Check in very often, when the tests pass. So that you can revert a bad refactoring without losing what was done before.
Keep on mind refactoring does not add value to your customer (this can be discussed), but the customer does not pay you to refactor. One rule of thumb is to refactor prior making changes or adding new capabilities to the code.
Read Martin Fowler's book "Refactoring"
BTW - that's Martin Fowler's own Amazon exec link, if you're wondering :)
I take crap and make it less crappy. :-)
Seriously. I don't refactor to create new functionality. Refactoring occurs before new stuff. If there are no tests I write tests to make sure that I'm not breaking anything with my refactoring. If there are tests, I use those. If the tests are insufficient, I might write more tests but I would consider this separate from the refactoring and do it first.
The first step for me is to notice that I can abstract something and make it more general (and useful in other places that need the functionality now), or I notice that something is bad and could be better (subjective). I don't refactor to generality without a reason. YAGNI principle applies.
We have the concept of shared ownership so the code is always mine -- I may not have written it, but I don't consider that when refactoring. I may seek to understand things before I decide it needs refactoring if the purpose isn't clear -- although that's almost always a reason to refactor in and of itself.
Depends very much on my goals. As has been said, you need unit tests to make sure your refactoring hasn't broken anything, and if it has you have to commit time to fixing it. For many situations, I test the existing solution, and if it works, wrap it rather than refactoring it, as this minimises the possibility of a break.
If I have to refactor, for example I recently had to port a bunch of ASCII based C++ to UNICODE, I tend to make sure I have good regression tests that work at end-user as well as unit level. Again, I try to use tools rather than manually refactoring, as this is less prone to error, and the errors you do get are systematic rather than random.
For me first thing is make sure the code hits all of the best practices of our office.
For example, using strict, warnings and taint for our Perl scripts.
If there are efficienry or speed troubles focus on them.
Things like find a better algorithm, or find a better way to to do what the quadruply nested for loop is doing.
And lastly see if there is a way to make the code more readable. This is normally accomplished by by turning 5 little scripts that do similar things into 1 module(class).
I refactor whilst writing new code, using unit tests. I'll also refactor old code, either mine or someone else's, if methods are too long, or variables named badly, or I spot duplication etc.
Start with getting unit tests, and then use automated refactoring tools. If the refactoring can't be automated then it's not truly a mechanical transformation of the code and so isn't a refactoring. The unit tests are to make sure that you are truly just performing mechanical transformations from one codebase to an equivalent one.
Refactoring without Unit Test is dangerous. Always have unit test. If you change something without having good testing you might be safe for some portion of the code but something elsewhere might not have the same behavior. With Unit Testing you protect any change.
Refactoring other code is fine too but the extreme is not. It's normal that someone else do not program like you do. It's not "kind" to change stuff because you would have did it in the other way. Just refactoring if it's really necessary.
I remove duplication, which unifies the thought patterns inherent in the code. Refactoring needs to achieve those two things. If you have code that does the same thing twice, refactor it to a common location, unifying the abstraction. If you have the same literal in three places, put it in a constant, unifying the purpose. If you have the same group of arguments, ensure they're always used in the same order or, better yet, put them in a common structure, unifying information groups.
I'm a lot more reluctant to refactor code written by others than to refactor my own.
If it's been written by one of my predecessors, I generally refactor only within a function. E.g. I might replace an if statement with a switch. Anything much bigger than that is usually out of scope and not within budget.
For my own code, I usually refactor as I'm writing, whenever something looks ugly or starts to smell. It's a lot easier to fix it now instead of waiting for it to cause problems down the road.
I agree with the other posters when you're refactoring code you wrote.
If it's code you didn't write, and especially if there's lots of it, I would start by using tools like fxCop, Visual Studio's Code Analysis, DevPartner -- I'm sure there are other good ones. They would give you ideas about where to start and what the most common coding issues are. I would also do stress testing to see where the bottlenecks are, therefore the greatest return on your effort at improving the code.
I love to refactor my code, but it is possible to overdo it. If you aren't really improving the app's performance, or seriously improving code readability, you should probably stop. There is always the possibility of introducing new bugs when you refactor, especially if you're working without unit tests.
First step: Identify a code smell.
Second step: Consider alternative implementations and what the trade offs are and which do I accept in terms of which is "better."
Third step: Implement better solution.
This doesn't differ if the code is mine or not as sometimes I may go back over code I wrote months or years ago and it'll look like code from someone else. I may write tests if I'm making new methods or there aren't adequate tests to the code, IMO.
General approach
I may start out by peeking at the software (component) from bird-view. Dependency analysis and graphing tools are a great help here (see later sections). I look for a circle in either the package-level or the class-level dependencies, and also alternatively for classes with too many dependencies. These are good candidates for refactoring.
Tools of the trade
Dependency analysis tools:
Python
Java