What kinds of unit tests pay off the most in business value? - unit-testing

My question assumes that folks already believe that unit tests of some sort are worthwhile and actually write them on their current projects. Let's also assume that unit tests for some parts of the code are not worth writing because they're testing trivial features. Examples are getters/setters, and/or things that a compiler/interpreter will catch immediately. The contrary assumption is that "interesting" code is worth testing.

The level of coverage for a region of code should be directly proportional to a combination of the likelihood of change and the amount of functionality that depends on it.
For example, we have some fairly complex parsing logic at the core of our base control classes. We unit-test the absolute piss out of it because it has to handle input which comes from systems outside our influence (high possibility of change) + ALL of our UI controls depend on it to be stable. The slightest tweak in the base ripples and magnifies through layers of inheritance and dependency.

The unit tests that verify data passed between layers often add the most value
UI <----> Business <---> Data

I find that the tests with the most pay-off are the ones testing for previously found bugs. In my opinion, if a team were to do nothing but add regression tests for bugs as they fixed them, they would get more bang for their buck than almost any other testing strategy.
By testing what has been broken (and testing it thoroughly), you know that you're testing areas susceptible to breakage.
This of course assumes that you are testing things that break during development not just waiting for QA reports to trickle in after a deploy or something like that.

As the question is asked, I have no doubt whatsoever that functional tests like the selenium test provides the greatest business value. A selenium (web) test clicks through the application and asserts the expected behavior, as defined by the user story/use cases. These tests assert that the main "thing" your software is trying to communicate to the customer is good.
If these things do not work, your customer will loose confidence in your ability to produce credible software, hence taking away a fair amount of business value.
Now there may be a fair amount of details that these tests do not cover, but I really believe they are essential in the creation of trust, especially when talking to a mass market.

To take a shot at my own question, here's the modules I've found worthwhile testing:
a) Any of the business rules. On my current application the business rules are externalized from the business objects.
b) Any of the virtual attributes on the business objects. Often these virtual attributes have involved algorithms.
c) Major components where we can state the expected produced output for a given input consumed. Often this bleeds into integration testing.
d) Our development process requires us to check in unit tests for new features and bug fixes. For bug fixes, we attempt to duplicate exactly what caused the bug and show the unit test can pass once the code fix is checked in.
e) Any of our translation layers where we translate from one data representation type to another such as POJO-XML

Discussed in SO Podcast 41 - this part is not in the transcript at the moment.
The responses in the blog contain further discussion of unit test coverage value.
Joel worries that excessive TDD (say, going from 90% to 100% test coverage) cuts time from other activities that could benefit the software product, such as better usability or additional features users are clamoring for.
Unit tests are absolutely useful as a form of “eating your own dogfood”, and documenting the behavior of your system. Even if you disregard the actual results of the tests completely, they’re still valuable as documentation and giving you a fresh outside perspective on your codebase.
Joel notes the difficulty of testing the UI (web or executable) versus testing the code behind the UI. The classic method of doing this is probably documented in The Art of UNIX Programming, where you start with a command-line app that takes in and spits out text. The GUI is simply a layer you paste on top of the command-line app, which leads to perfect testability — but perhaps not such great apps, in the long run. Which is more important?

All reasonable unit tests are valuable, but the ones that tend to save the most time in my experience, thus yielding the best ROI, are unit tests for bugfixes. Mostly because the bugfixes apply to the most difficult/brittle parts of the code.

TDD helps you write your code faster, so only TDD when going faster would provide more business value. ;)

Tests that can be reused in many places. Examples from my current project:
syntax-checking of configuration files
Checking that model properties referred to in RTF templates actually exist in the model classes (people change the classes and usually forget to adjust the templates)
Checking of some oft-overlooked style guide rules (each UI mask must have a title bar text, each button must have a unique mnemonic, etc.)
Even programmers who are skeptical about unit tests love these because they can use them with a minimal amount of work - and in some cases it leads them towards writing unit tests of their own.

Related

Test Driven Development (TDD) for User Interface (UI) with functional tests

As we know, TDD means "write the test first, and then write the code". And when it comes to unit-testing, this is fine, because you are limited within the "unit".
However when it comes to UI, writing functional tests beforehand makes less sense (to me). This is because the functional tests have to verify a (possibly long) set of functional requirements. This may usually span multiple screens (pages), preconditions like "logged in", "having recently inserted a record", etc.
According to Wikipedia:
Test-driven development is difficult to use in situations where full functional tests are required to determine success or failure. Examples of these are user interfaces, programs that work with databases, and some that depend on specific network configurations.
(Of course, Wikipedia is no "authority", but this sounds very logical.)
So, any thoughts, or better - experience, with functional tests-first for UI, and then code. Does it work? And is it "pain"?
Try BDD, Behavior Driven Development. It promotes writing specification stories which are then executed step by step stimulating the app to change it's state and verify the outcomes.
I use BDD scenarios to write UI code. Business requests are described using BDD stories and then the functionality is being written to turn the stories i.e. tests green.
The key to testing a UI is to separate your concerns - the behavior of your UI is actually different than the appearance of your UI. We struggle with this mentally, so as an exercise take a game like Tetris and imagine porting it from one platform (say PC) to another (the web). The intuition is that everything is different - you have to rewrite everything! But in reality all this is the same:
The rules of the game.
The speed of the blocks falling.
The logic for rows matching
Which block is chosen
And more...
You get the idea. The only thing that changes is how the screen is drawn. So separate how your UI looks from how it works. This is tricky, and usually can't be perfect, but it's close. My recommendation is to write the UI last. Tests will provide the feedback if your behavior is working, and the screen will tell if it looks right. This combination provides the fast feedback we're looking for from TDD without a UI.
TDD for functional tests makes sense to me. Write a functional test, see it failing, then divide the problem into parts, write a unit test for each part, write code for each part, see unit tests pass and then your functional test should pass.
Here is a workflow suggested in the book Test-Driven Development with Python by Harry Percival (available online for free):
P.S. You can automate your functional tests using e.g. Selenium. You can add them to the Continuous Integration cycle as well as unit tests.
ATDD is very useful if you have a very good grasp on how the UI should behave and also ensure that the cost of change to the functional tests that you build upfront is less.
The biggest problem with doing this, of course, is most often the UI is not fully speced out.
For example, if you are building a product, and are still doing rapid iterations of getting a UI that goes through a usability testing and the feedback incorporated, you do not want the baggage of fixing your functional tests with every small change to the UI.
Given that functional tests are generally slow, the feedback cycle is high and its very painful to keep them green with the changes to UI.
I work on a product team where we had this exact same decision to make. A colleague of mine has summarized our final approach very well here. (Disclaimer: Please ignore the tool specific details there.)
I've done Acceptance TDD with a UI. We would assert the common header and footer were used via xpath'ing for the appropriate ids. We also used xpath to assert the data appeared in the correct tag relative to whatever ids we used for basic layout and structure. We would also assert the output page is valid html 4.01 strict.
Programmatic UI tests is a salvation when you want to be confident your UI does what expected. The UI tests are closer to BDD (behavior driven development) than to TDD. But the terminology is cloudy and however you call 'em they are useful!
I have got rather good experience using cucumber. I use it to test flex applications but it's mostly used to test web apps. Click the link! The site have got really nice examples of methodology.
Advanced UI is not compatible with straight TDD.
Use XP practices to guide you.
It is unwise to slavishly follow TDD (or even BDD) for UI, it's worth thinking about the root goals of both practices. In particular, we want to avoid using general purpose TDD tools for UI. (If) They're simply not designed for that use case, you end up binding yourself to tests that you have to iterate on as rapidly as the UI code (I call this, UI test-lock).
That's not the purpose of TDD. We don't do it to go slower, we use to go fast (and NOT break things!).
We want to achieve a few things with TDD:
Code / Algorithmic Correctness
Minimal code to complete a test
Fast feedback
Architectural separation of concerns to the units (modular parts under test, no more, no less)
Formalize specification in a useful/usable way
To some extent, document what the module/system does (e.g. provide an example of use.)
We can achieve these goals in UI development, but how and where we apply practices is important to avoid test-lock.
Applying these principles to UI
What benefits can we get from testing UI? How can we avoid test-lock.
With usual TDD one major benefit is that we can think about the abstractions, and generally design the shape of a test subject, as we think of names, relationships etc.
For UI, much of what we want to do, especially if it's non-trivial, is only reasoned about effectively when we can see it manifested on screen.
Writing tests that describe what we expect to see in a UI, specifically what data, can be TDD tested. For example, if we have a view that should show an account balance, a test that expects it to appear in an on-screen element that can be addressed by some form of ID, is good. We will know our app/view is displaying content that's expected, in standard UI library elements.
If, on the other hand, we want to create a more complex UI and wish to apply TDD to it, we will have some problems. Chances are, we won't even know how to code it, if it's novel enough.
For this we would prototype and iterate with tools which give us fast feedback, but don't saddle us with any need to write tests first. When we reach a point where implementation becomes clear, we can then switch up to test first for the production code.
This keeps us fast. Designers and Product owners in the team can also see results, provide inputs, and tweaks.
The code should be well-formed. As soon as we identify small potential pieces which will make it to production code, we can migrate them to be under test. Using TCR or a similar method to add tests, or simply use the TDD method to re-write. Remember once it works, you can apply snapshot/record-playback testing, to keep regression errors at bay.
Do what works, keep it simple.
Addendum
Notes about the scope of UI discussed above
There will be a need to figure out how to make things work as expected in any advanced UI. There is also a likelihood that specs change and tweak in extremely arbitrary / capricious ways. We need to ensure any tests we apply to these test subjects are flexible, or extremely easy to re-generate based on a working model. (replay tests are gold for this.)
Good development practices, in a nutshell, code separation. Is going to help you ensure the code is at least testable, and simpler to debug, maintain and reason about.
Don't be a slave to ritual.
It doesn't make you correct. Just slower.

Is there such a thing as too much unit testing?

I tried looking through all the pages about unit tests and could not find this question. If this is a duplicate, please let me know and I will delete it.
I was recently tasked to help implement unit testing at my company. I realized that I could unit test all the Oracle PL/SQL code, Java code, HTML, JavaScript, XML, XSLT, and more.
Is there such a thing as too much unit testing? Should I write unit tests for everything above or is that overkill?
This depends on the project and its tolerance for failure. There is no single answer. If you can risk a bug, then don't test everything.
When you have tons of tests, it is also likely you will have bugs in your tests. Adding to your headaches.
test what needs testing, leave what does not which often leaves the fairly simple stuff.
Is there such as thing as too much unit testing?
Sure. The problem is finding the right balance between enough unit testing to cover the important areas of functionality, and focusing effort on creating new value for your customers in the terms of system functionality.
Unit testing code vs. leaving code uncovered by tests both have a cost.
The cost of excluding code from unit testing may include (but aren't limited to):
Increased development time due to fixing issues you can't automatically test
Fixing problems discovered during QA testing
Fixing problems discovered when the code reaches your customers
Loss of revenue due to customer dissatisfaction with defects that made it through testing
The costs of writing a unit test include (but aren't limited to):
Writing the original unit test
Maintaining the unit test as your system evolves
Refining the unit test to cover more conditions as you discover them in testing or production
Refactoring unit tests as the underlying code under test is refactored
Lost revenue when it takes longer for you application to reach enter the market
The opportunity cost of implementing features that could drive sales
You have to make your best judgement about what these costs are likely to be, and what your tolerance is for absorbing such costs.
In general, unit testing costs are mostly absorbed during the development phase of a system - and somewhat during it's maintenance. If you spend too much time writing unit tests you may miss a valuable window of opportunity to get your product to market. This could cost you sales or even long-term revenue if you operate in a competitive industry.
The cost of defects is absorbed during the entire lifetime of your system in production - up until the point the defect is corrected. And potentially, even beyond that, if they defect is significant enough that it affects your company's reputation or market position.
Kent Beck of JUnit and JUnitMax fame answered a similar question of mine.
The question has slightly different semantics but the answer is definitely relevant
The purpose of Unit tests is generally to make it possibly to refector or change with greater assurance that you did not break anything. If a change is scary because you do not know if you will break anything, you probably need to add a test. If a change is tedious because it will break a lot of tests, you probably have too many test (or too fragile a test).
The most obvious case is the UI. What makes a UI look good is something that is hard to test, and using a master example tends to be fragile. So the layer of the UI involving the look of something tends not to be tested.
The other times it might not be worth it is if the test is very hard to write and the safety it gives is minimal.
For HTML I tended to check that the data I wanted was there (using XPath queries), but did not test the entire HTML. Similarly for XSLT and XML. In JavaScript, when I could I tested libraries but left the main page alone (except that I moved most code into libraries). If the JavaScript is particularly complicated I would test more. For databases I would look into testing stored procedures and possibly views; the rest is more declarative.
However, in your case first start with the stuff that worries you the most or is about to change, especially if it is not too difficult to test. Check the book Working Effectively with Legacy Code for more help.
Yes, there is such a thing as too much unit testing. One example would be unit testing in a whitebox manner, such that you're effectively testing the specific implementation; such testing would effectively slow down progress and refactoring by requiring compliant code to need new unit tests (because the tests were dependent upon specific implementation details).
I suggest that in some situations you might want automated testing, but no 'unit' testing at all (Should one test internal implementation, or only test public behaviour?), and that any time spent writing unit tests would be better spent writing system tests.
While more tests is usually better (I have yet to be on a project that actually had too many tests), there's a point at which the ROI bottoms out, and you should move on. I'm assuming you have finite time to work on this project, by the way. ;)
Adding unit tests has some amount of diminishing returns -- after a certain point (Code Complete has some theories), you're better off spending your finite amount of time on something else. That may be more testing/quality activities like refactoring and code review, usability testing with real human users, etc., or it could be spent on other things like new features, or user experience polish.
As EJD said, you can't verify the absence of errors.
This means there are always more tests you could write. Any of these could be useful.
What you need to understand is that unit-testing (and other types of automated testing you use for development purposes) can help with development, but should never be viewed as a replacement for formal QA.
Some tests are much more valuable than others.
There are parts of your code that change a lot more frequently, are more prone to break, etc. These are the most economical tests.
You need to balance out the amount of testing you agree to take on as a developer. You can easily overburden yourself with unmaintainable tests. IMO, unmaintainable tests are worse than no tests because they:
Turn others off from trying to maintain a test suite or write new tests.
Detract from you adding new, meaningful functionality. If automated testing is not a net-positive result, you should ditch it like other engineering practices.
What should I test?
Test the "Happy Path" - this ensures that you get interactions right, and that things are wired together properly. But you don't adequately test a bridge by driving down it on a sunny day with no traffic.
Pragmatic Unit Testing recommends you use Right-BICEP to figure out what to test. "Right" for the happy path, then Boundary conditions, check any Inverse relationships, use another method (if it exists) to Cross-check results, force Error conditions, and finally take into account any Performance considerations that should be verified. I'd say if you are thinking about tests to write in this way, you're most likely figure out how to get to an adequate level of testing. You'll be able to figure out which ones are more useful and when. See the book for much more info.
Test at the right level
As others have mentioned, unit tests are not the only way to write automated tests. Other types of frameworks may be built off of unit tests, but provide mechanisms to do package level, system or integration tests. The best bang for the buck may be at a higher level, and just using unit testing to verify a single component's happy path.
Don't be discouraged
I'm painting a more grim picture here than I expect most developers will find in reality. The bottom line is that you make a commitment to learn how to write tests and write them well. But don't let fear of the unknown scare you into not writing any tests. Unlike production code, tests can be ditched and rewritten without many adverse effects.
Unit test any code that you think might change.
You should only really write unit tests for any code which you have written yourself. There is no need to test the functionality inherently provided to you.
For example, If you've been given a library with an add function, you should not be testing that add(1,2) returns 3. Now if you've WRITTEN that code, then yes, you should be testing it.
Of course, whoever wrote the library may not have tested it and it may not work... in which case you should write it yourself or get a separate one with the same functionality.
Well, you certainly shouldn't unit test everything, but at least the complicated tasks or those that will most likely contain errors/cases you haven't thought of.
The point of unit testing is being able to run a quick set of tests to verify that your code is correct. This lets you verify that your code matches your specification and also lets you make changes and ensure that they don't break anything.
Use your judgement. You don't want to spend all of your time writing unit tests or you won't have any time to write actual code to test.
When you've unit tested your unit tests, thinking you have then provided 200% coverage.
There is a development approach called test-driven development which essentially says that there is no such thing as too much (non-redundant) unit testing. That approach, however, is not a testing approach, but rather a design approach which relies on working code and a more or less complete unit test suite with tests which drive every single decision made about the codebase.
In a non-TDD situation, automated tests should exercise every line of code you write (in particular Branch coverage is good), but even then there are exceptions - you shouldn't be testing vendor-supplied platform or framework code unless you know for certain that there are bugs which will affect you in that platform. You shouldn't be testing thin wrappers (or, equally, if you need to test it, the wrapper is not thin). You should be testing all core business logic, and it is certainly helpful to have some set of tests that exercise your database at some elemental level, although those tests will never work in the common situation where unit tests are run every time you compile.
Specifically with regard to database testing is intrinsically slow, and depending on how much logic is held in your database, quite difficult to get right. Typically things like dbs, HTML/XML documents & templating, and other document-ish aspects of a program are verified moreso than tested. The difference is usually that testing tries to exercise execution paths whereas verification tries to verify inputs and outputs directly.
To learn more about this I would suggest reading up on "Code Coverage". There is a lot of material available if you're curious about this.

Are unit tests and acceptance tests enough?

If I have unit tests for each class and/or member function and acceptance tests for every user story do I have enough tests to ensure the project functions as expected?
For instance if I have unit tests and acceptance tests for a feature do I still need integration tests or should the unit and acceptance tests cover the same ground? Is there overlap between test types?
I'm talking about automated tests here. I know manual testing is still needed for things like ease of use, etc.
If I have unit tests for each class and/or member function and acceptance tests for every user story do I have enough tests to ensure the project functions as expected?
No. Tests can only verify what you have thought of. Not what you haven't thought of.
I'd recommend reading chapters 20 - 22 in the 2nd edition of Code Complete. It covers software quality very well.
Here's a quick breakdown of some of the key points (all credit goes to McConnell, 2004)
Chapter 20 - The Software-Quality Landscape:
No single defect-detection technique is completely effective by itself
The earlier you find a defect, the less intertwined it will become with the rest of your code and the less damage it will cause
Chapter 21 - Collaborative Construction:
Collaborative development practices tend to find a higher percentage of defects than testing and to find them more efficiently
Collaborative development practices tend to find different kinds of errors than testing does, implying that you need to use both reviews and testing to ensure the quality of your software
Pair programming typically costs the about the same as inspections and produces similar quality code
Chapter 22 - Developer Testing:
Automated testing is useful in general and is essential for regression testing
The best way to improve your testing process is to make it regular, measure it, and use what you learn to improve it
Writing test cases before the code takes the same amount of time and effort as writing the test cases after the code, but it shortens defect-detection-debug-correction-cycles (Test Driven Development)
As far as how you are formulating your unit tests, you should consider basis testing, data-flow analysis, boundary analysis etc. All of these are explained in great detail in the book (which also includes many other references for further reading).
Maybe this isn't exactly what you were asking, but I would say automated testing is definitely not enough of a strategy. You should also consider such things as pair programming, formal reviews (or informal reviews, depending on the size of the project) and test scaffolding along with your automated testing (unit tests, regression testing etc.).
The idea of multiple testing cycles is to catch problems as early as possible when things change.
Unit tests should be done by the developers to ensure the units work in isolation.
Acceptance tests should be done by the client to ensure the system meets the requirements.
However, something has changed between those two points that should also be tested. That's the integration of units into a product before being given to the client.
That's something that should first be tested by the product creator, not the client. The minute you invlove the client, things slow down so the more fixes you can do before they get their grubby little hands on it, the better.
In a big shop (like ours), there are unit tests, integration tests, globalization tests, master-build tests and so on at each point where the deliverable product changes. Only once all high severity bugs are fixed (and a plan for fixing low priority bugs is in place) do we unleash the product to our beta clients.
We do not want to give them a dodgy product simply because fixing a bug at that stage is a lot more expensive (especially in terms of administrivia) than anything we do in-house.
It's really impossible to know whether or not you have enough tests based simply on whether you have a test for every method and feature. Typically I will combine testing with coverage analysis to ensure that all of my code paths are exercised in my unit tests. Even this is not really enough, but it can be a guide to where you may have introduced code that isn't exercised by your tests. This should be an indication that more tests need to be written or, if you're doing TDD, you need to slow down and be more disciplined. :-)
Tests should cover both good and bad paths, especially in unit tests. Your acceptance tests may be more or less concerned with the bad path behavior but should at least address common errors that may be made. Depending on how complete your stories are, the acceptance tests may or may not be adequate. Often there is a many-to-one relationship between acceptance tests and stories. If you only have one automated acceptance test for every story, you probably don't have enough unless you have different stories for alternate paths.
Multiple layers of testing can be very useful. Unit tests to make sure the pieces behave; integration to show that clusters of cooperating units cooperate as expected, and "acceptance" tests to show that the program functions as expected. Each can catch problems during development. Overlap per se isn't a bad thing, though too much of it becomes waste.
That said, the sad truth is that you can never ensure that the product behaves "as expected", because expectation is a fickle, human thing that gets translated very poorly onto paper. Good test coverage won't prevent a customer from saying "that's not quite what I had in mind...". Frequent feedback loops help there. Consider frequent demos as a "sanity test" to add to your manual mix.
Probably not, unless your software is really, really simple and has only one component.
Unit tests are very specific, and you should cover everything thoroughly with them. Go for high code-coverage here. However, they only cover one piece of functionality at a time and not how things work together. Acceptance tests should cover only what the customer really cares about at a high level, and while it will catch some bugs in how things work together, it won't catch everything as the person writing such tests will not know about the system in depth.
Most importantly, these tests may not be written by a tester. Unit tests should be written by developers and run frequently (up to every couple minutes, depending on coding style) by the devs (and by the build system too, ideally). Acceptance tests are often written by the customer or someone on behalf of the customer, thinking about what matters to the customer. However, you also need tests written by a tester, thinking like a tester (and not like a dev or customer).
You should also consider the following sorts of tests, which are generally written by testers:
Functional tests, which will cover pieces of functionality. This may include API testing and component-level testing. You will generally want good code-coverage here as well.
Integration tests, which put two or more components together to make sure that they work together. You don't want one component to put out the position in the array where the object is (0-based) when the other component expects the count of the object ("nth object", which is 1-based), for example. Here, the focus is not on code coverage but on coverage of the interfaces (general interfaces, not code interfaces) between components.
System-level testing, where you put everything together and make sure it works end-to-end.
Testing for non-functional features, like performance, reliability, scalability, security, and user-friendliness (there are others; not all will relate to every project).
Integration tests are for when your code integrates with other systems such as 3rd party applications, or other in house systems such as the environment, database etc. Use integration tests to ensure that the behavior of the code is still as expected.
In short no.
To begin with, your story cards should have acceptance criteria. That is, acceptance criteria specified by the product owner in conjunction with the analyst specifying the behavior required and if meet, the story card will be accepted.
The acceptance criteria should drive the automated unit test (done via TDD) and the automated regression/ functional tests which should be run daily. Remember we want to move defects to the left, that is, the sooner we find ‘em the cheaper and faster they are to fix. Furthermore, continuous testing enables us to refactor with confidence. This is required to maintain a sustainable pace for development.
In addition, you need automated performance test. Running a profiler daily or overnight would provide insight into the consumption of CPU and memory and if any memory leaks exist. Furthermore, a tool like loadrunner will enable you to place a load on the system that reflects actual usage. You will be able to measure response times and CPU and memory consumption on the production like machine running loadrunner.
The automated performance test should reflect actual usage of the app. You measure the number of business transactions (i.e., if a web application the clicking on a page and the response to the users or round trips to the server). and determine the mix of such transaction along with the reate they arrive per second. Such information will enable you to design properly the automated loadrunner test required to performance test the application. As is often the case, some of the performance issues will trace back to the implementation of the application while other will be determined by the configuration of the server environment.
Remember, your application will be performance tested. The question is, will the first performance test happen before or after you release the software. Believe me, the worse place to have a performance problem is in production. Performance issues can be the hardest to fix and can cause a deployed to all users to fail thus cancelling the project.
Finally, there is User Acceptance Testing (UAT). These are test designed by the production owner/ business partner to test the overall system prior to release. In generally, because of all the other testing, it is not uncommon for the application to return zero defects during UAT.
It depends on how complex your system is. If your acceptance tests (which satisfy the customers requirements) exercise your system from front to back, then no you don't.
However, if your product relies on other tiers (like backend middleware/database) then you do need a test that proves that your product can happily link up end-to-end.
As other people have commented, tests don't necessarily prove the project functions as expected, just how you expect it to work.
Frequent feedback loops to the customer and/or tests that are written/parsable in a way the customer understands (say for example in a BDD style ) can really help.
If I have unit tests for each class
and/or member function and acceptance
tests for every user story do I have
enough tests to ensure the project
functions as expected?
This is enough to show your software is functionally correct, at least as much as your test coverage is sufficient. Now, depending on what you're developing, there certainly are non-functional requirements that matter, think about reliability, performance and scability.
Technically, a full suit of acceptance tests should cover everything. That being said, they're not "enough" for most definitions of enough. By having unit tests and integration tests, you can catch bugs/issues earlier and in a more localized manner, making them much easier to analyze and fix.
Consider that a full suit of manually executed tests, with the directions written on paper, would be enough to validate that everything works as expected. However, if you can automate the tests, you'd be much better off because it makes doing the testing that much easier. The paper version is "complete", but not "enough". In the same way, each layer of tests add more to the value of "enough".
It's also worth noting that the different sets of tests tend to test the product/code from a different "viewpoint". Much the same way QA may pick up bugs that dev never thought to test for, one set of tests may find things the other set wouldn't.
Acceptance testing can even be made manually by the client if the system in hand is small.
Unit and small integration tests (consisting of unit like tests) are there for you to build a sustainable system.
Don't try to write test for each part of the system. That is brittle (easy to break) and overwhelming.
Decide on the critical parts of the system that takes too much amount of time to manually test and write acceptance tests only for that parts to make things easy for everyone.

When to unit-test vs manual test

While unit-testing seems effective for larger projects where the APIs need to be industrial strength (for example development of the .Net framework APIs, etc.), it seems possibly like overkill on smaller projects.
When is the automated TDD approach the best way, and when might it be better to just use manual testing techniques, log the bugs, triage, fix them, etc.
Another issue--when I was a tester at Microsoft, it was emphasized to us that there was a value in having the developers and testers be different people, and that the tension between these two groups could help create a great product in the end. Can TDD break this idea and create a situation where a developer might not be the right person to rigorously find their own mistakes? It may be automated, but it would seem that there are many ways to write the tests, and that it is questionable whether a given set of tests will "prove" that quality is acceptable.
The effectiveness of TDD is independent of project size. I will practice the three laws of TDD even on the smallest programming exercise. The tests don't take much time to write, and they save an enormous amount of debugging time. They also allow me to refactor the code without fear of breaking anything.
TDD is a discipline similar to the discipline of dual-entry-bookkeeping practiced by accountants. It prevents errors in-the-small. Accountants will enter every transaction twice; once as a credit, and once as a debit. If no simple errors were made, then the balance sheet will sum to zero. That zero is a simple spot check that prevents the executives from going to jail.
By the same token programmers write unit tests in advance of their code as a simple spot check. In effect, they write each bit of code twice; once as a test, and once as production code. If the tests pass, the two bits of code are in agreement. Neither practice protects against larger and more complex errors, but both practices are nonetheless valuable.
The practice of TDD is not really a testing technique, it is a development practice. The word "test" in TDD is more or less a coincidence. As such, TDD is not a replacement for good testing practices, and good QA testers. Indeed, it is a very good idea to have experienced testers write QA test plans independently (and often in advance of) the programmers writing the code (and their unit tests).
It is my preference (indeed my passion) that these independent QA tests are also automated using a tool like FitNesse, Selenium, or Watir. The tests should be easy to read by business people, easy to execute, and utterly unambiguous. You should be able to run them at a moment's notice, usually many times per day.
Every system also needs to be tested manually. However, manual testing should never be rote. A test that can be scripted should be automated. You only want to put humans in the loop when human judgement is needed. Therefore humans should be doing exploratory testing, not blindly following test plans.
So, the short answer to the question of when to unit-test versus manual test is that there is no "versus". You should write automated unit tests first for the vast majority of the code you write. You should have automated QA acceptance tests written by testers. And you should also practice strategic exploratory manual testing.
Unit tests aren't meant to replace functional/component tests. Unit tests are really focused, so they won't be hitting database, external services, etc. Integration tests does that, but you can have them really focused. The bottom line, is that on the specific question, the answer is that they don't replace those manual tests.
Now, automated functional tests + automated component tests can certainly replace manual tests. It will depend a lot of the project and the approach to it on who will actually do those.
Update 1: Note that if developers are creating automated functional tests you still want to review that those have the appropriate coverage, complementing them as appropriate. Some developers create automated functional tests with their "unit" test framework, because they still have to do smoke tests regardless of the unit tests, and it really helps having those automated :)
Update 2: Unit testing isn't overkill for a small project, nor is automating the smoke tests or using TDD. What is overkill is having the team doing any of that for their first time on the small project. Doing any of those have an associated learning curve (specially unit testing or TDD), and not always will be done right at first. You also want someone who has been doing it for a while involved, to help avoid pitfalls and get pasts some coding challenges that aren't obvious when starting on it. The issue is that it isn't common for teams to have these skills.
TDD is the best approach whenever it is feasible to do so. TDD testing is automatic, quantifiable through code coverage, and reliable method of ensuring code quality.
Manual testing requires a huge amount of time (as compared to TDD) and suffers from human error.
There is nothing saying that TDD means only developers test. Developers should be responsible for coding a percentage of the test framework. QA should be responsible for a much larger portion. Developers test APIs the way they want to test them. QA tests APIs in ways that I really wouldn't have ever thought to and do things that, while seemingly crazy, are actually done by customers.
I would say that unit-tests are a programmers aid to answer the question:
Does this code do what I think it
does?
This is a question they need to ask themselves alot. Programers like to automate anything they do alot where they can.
The separate test team needs to answer a different question:-
Does this system do what I (and the end users) expect it
to do? Or does it suprise me?
There are a whole massive class of bugs related to the programer or designers having a different idea about what is correct that unit tests will never pickup.
According to studies of various projects (1), Unit tests find 15..50% of the defects (average of 30%). This doesn't make them the worst bug finder in your arsenal, but not a silver bullet either. There are no silver bullets, any good QA strategy consists of multiple techniques.
A test that is automated runs more often, thus it will find defects earlier and reduce total cost of these immensely - that is the true value of test automation.
Invest your ressources wisely and pick the low hanging fruit first.
I find that automated tests are easiest to write and to maintain for small units of code - isolated functions and classes. End user functionality is easier tested manually - and a good tester will find many oddities beyond the required tests. Don't set them up against each other, you need both.
Dev vs. Testers Developers are notoriously bad at testing their own code: reasons are psychological, technical and last not least economical - testers are usually cheaper than developers. But developers can do their part, and make testing easier. TDD makes testing an intrinsic part of program construction, not just an afterthought, that is the true value of TDD.
Another interesting point about testing: There's no point in 100% coverage. Statistically, bugs follow an 80:20 rule - the majority of bugs is found in small sections of code. Some studies suggest that this is even sharper - and tests should focuse on the places where bugs turn up.
(1) Programming Productivity Jones 1986 u.a., quoted from Code Complete, 2nd. ed. But as others have said, unit tests are only one part of tests, integration, regression and system tests can be - at leat partially - automated as well.
My interpretation of the results: "many eyes" has the best defect detection, but only if you have some formal process that makes them actually look.
Every application gets tested.
Some applications get tested in the form of does my code compile and does the code appear to function.
Some applications get tested with Unit tests. Some developers are religious about Unit tests, TDD and code coverage to a fault. Like everything, too much is more often than not bad.
Some applications are luckily enough to get tested via a QA team. Some QA teams automate their testing, others write test cases and manually test.
Michael Feathers, who wrote: Working Effectively with Legacy Code, wrote that code not wrapped in tests is legacy code. Until you have experienced The Big Ball of Mud, I don't think any developer truly understands the benefit of good Application Architecture and a suite of well written Unit Tests.
Having different people test is a great idea. The more people that can look at an application the more likely all the scenarios will get covered, including the ones you didn't intend to happen.
TDD has gotten a bad rap lately. When I think of TDD I think of dogmatic developers meticulously writing tests before they write the implementation. While this is true, what has been overlooked is by writing the tests, (first or shortly after) the developer experiences the method/class in the shoes of the consumer. Design flaws and shortcomings are immediately apparent.
I argue that the size of the project is irrelevant. What is important is the lifespan of the project. The longer a project lives the more the likelihood that a developer other than the one who wrote it will work on it. Unit tests are documentation to the expectations of the application -- A manual of sorts.
Unit tests can only go so far (as can all other types of testing). I look on testing as a kind of "sieve" process. Each different type of testing is like a sieve that you are placing under the outlet of your development process. The stuff that comes out is (hopefully) mostly features for your software product, but it also contains bugs. The bugs come in lots of different shapes and sizes.
Some of the bugs are pretty easy to find because they are big or get caught in basically any kind of sieve. On the other hand, some bugs are smooth and shiny, or don't have a lot of hooks on the sides so they would slip through one type of sieve pretty easily. A different type of sieve might have different shape or size holes so it will be able to catch different types of bugs. The more sieves you have, the more bugs you will catch.
Obviously the more sieves you have in the way, the slower it is for the features to get through as well, so you'll want to try to find a happy medium where you aren't spending too much time testing that you never get to release any software.
The nicest point (IMO) of automated unit tests is that when you change (improve, refactor) the existing code, it's easy to test that you didn't break it. It would be tedious to test everything manually again and again.
Your question seems to be more about automated testing vs manual testing. Unit testing is a form of automated testing but a very specific form.
Your remark about having separate testers and developers is right on the mark though. But that doesn't mean developers shouldn't do some form of verification.
Unit testing is a way for developers to get fast feedback on what they're doing. They write tests to quickly run small units of code and verify their correctness. It's not really testing in the sense you seem to use the word testing just like a syntax check by a compiler isn't testing. Unit testing is a development technique. Code that's been written using this technique is probably of higher quality than code written without but still has to go through quality control.
The question about automated testing vs manual testing for the test department is easier to answer. Whenever the project gets big enough to justify the investment of writing automated tests you should use automated tests. When you've got lots of small one-time tests you should do them manually.
Having been on both sides, QA and development, I would assert that someone should always manually test your code. Even if you are using TDD, there are plenty of things that you as a developer may not be able to cover with unit tests, or may not think about testing. This especially includes usability and aesthetics. Aesthetics includes proper spelling, grammar, and formatting of output.
Real life example 1:
A developer was creating a report we display on our intranet for managers. There were many formulas, all of which the developer tested before the code came to QA. We verified that the formulas were, indeed, producing the correct output. What we asked development to correct, almost immediately, was the fact that the numbers were displayed in pink on a purple background.
Real life example 2:
I write code in my spare time, using TDD. I like to think I test it thoroughly. One day my wife walked by when I had a message dialog up, read it, and promptly asked, "What on Earth is that message supposed to mean?" I thought the message was rather clear, but when I reread it I realized it was talking about parent and child nodes in a tree control, and probably wouldn't make sense to the average user. I reworded the message. In this case, it was a usability issue, which was not caught by my own testing.
unit-testing seems effective for larger projects where the APIs need to be industrial strength, it seems possibly like overkill on smaller projects.
It's true that unit tests of a moving API are brittle, but unit-testing is also effective on API-less projects such as applications. Unit-testing is meant to test the units a project is made with. It allows ensuring every unit works as expected. This is a real safety net when modifying - refactoring - the code.
As far as the size of the project is concerned, It's true that writing unit-tests for a small project can be overkill. And here, I would define small project as a small program, that can be tested manually, but very easily and quickly, in no more than a few seconds. Also a small project can grow, in which case it might be advantageous to have unit tests at hand.
there was a value in having the developers and testers be different people, and that the tension between these two groups could help create a great product in the end.
Whatever the development process, unit-testing is not meant to supersede any other stages of test, but to complement them with tests at the development level, so that developers can get very early feedback, without having to wait for an official build and official test. With unit-testing, development team delivers code that works, downstream, not bug-free code, but code that can be tested by the test team(s).
To sum up, I test manually when it's really very easy, or when writing unit tests is too complex, and I don't aim to 100% coverage.
I believe it is possible to combine the expertise of QA/testing staff (defining the tests / acceptance criteria), with the TDD concept of using a developer owned API (as oppose to GUI or HTTP/messaging interface) to drive an application under test.
It is still critical to have independent QA staff, but we don't need huge manual test teams anymore with modern test tools like FitNesse, Selenium and Twist.
Just to clarify something many people seem to miss:
TDD, in the sense of
"write failing test, write code to make test pass, refactor, repeat"
Is usually most efficient and useful when you write unit tests.
You write a unit test around just the class/function/unit of code you are working on, using mocks or stubs to abstract out the rest of the system.
"Automated" testing usually refers to higher level integration/acceptance/functional tests - you can do TDD around this level of testing, and it's often the only option for heavily ui-driven code, but you should be aware that this sort of testing is more fragile, harder to write test-first, and no substitute for unit testing.
TDD gives me, as the developer, confidence that the change I am making to the code has the intended consequences and ONLY the intended consequences, and thus the metaphor of TDD as a "safety net" is useful; change any code in a system without it and you can have no idea what else you may have broken.
Engineering tension between developers and testers is really bad news; developers cultivate a "well, the testers are paid to find the bugs" mindset (leading to laziness) and the testers -- feeling as if they aren't being seen to do their jobs if they don't find any faults -- throw up as many trivial problems as they can. This is a gross waste of everyone's time.
The best software development, in my humble experience, is where the tester is also a developer and the unit tests and code are written together as part of a pair programming exercise. This immediately puts the two people on the same side of the problem, working together towards the same goal, rather than putting them in opposition to each other.
Unit testing is not the same as functional testing. And as far as automation is concerned, it should normally be considered when the testing cycle will be repeated more than 2 or 3 times... It is preferred for regression testing. If the project is small or it will not have frequent changes or updates then manual testing is a better and less cost effective option. In such cases automation will prove to be more costly with the script writing and maintainence.

How is unit testing better than just testing the entire output of your application as a whole?

I don't understand how an unit test could possibly benefit.
Isn't it sufficient for a tester to test the entire output as a whole rather than doing unit tests?
Thanks.
What you are describing is integration testing. What integration testing will not tell you is which piece of your massive application is not working correctly when your output is no longer correct.
The advantage to unit testing is that you can write a test for each business assumption or algorithm step that you need your program to perform. When someone adds or changes code to your application, you immediately know exactly which step, which piece, and maybe even which line of code is broken when a bug is introduced. The time savings on maintenence for that reason alone makes it worthwhile, but there is an even bigger advantage in that regression bugs cannot be introduced (assuming your tests are running automatically when you build your software). If you fix a bug, and then write a test specifically to catch that bug in the future, there is no way someone could accidentally introduce it again.
The combination of integration testing and unit testing can let you sleep much easier at night, especially when you've checked in a big piece of code that day.
The earlier you catch bugs, the cheaper they are to fix. A bug found during unit testing by the coder is pretty cheap (just fix the darn thing).
A bug found during system or integration testing costs more, since you have to fix it and restart the test cycle.
A bug found by your customer will cost a lot: recoding, retesting, repackaging and so forth. It may also leave a painful boot print on your derriere when you inform management that you didn't catch it during unit testing because you didn't do any, thinking that the system testers would find all the problems :-)
How much money would it cost GM to recall 10,000 cars because the catalytic converter didn't work properly?
Now think of how much it would cost them if they discovered that immediately after those converters were delivered to them, but before they were put into those 10,000 cars.
I think you'll find the latter option to be quite a bit cheaper.
That's one reason why test driven development and continuous integration are (sometimes) a good thing - testing is done all the time.
In addition, unit tests don't check that the program works as a whole, just that each little bit performs as expected. That's often quite a lot more than higher level tests would check.
From my experience:
Integration and functional testing tend to be more indicative of the overall quality of the system, than unit test suit is.
High level testing (functional, acceptance) is a QA tool.
Unit testing is a development tool. Especially in a TDD context, where unit test becomes more of a design implement, rather than that of a quality assurance.
As a result of better design, quality of the entire system improves (indirectly).
Passing unit test suite is meant to ensure that a single component conforms to the developer's intentions (correctness). Acceptance test is the level that covers validity of the system (i.e. system does what user want it to do).
Summary:
Unit test is meant as a development tool first, QA tool second.
Acceptance test is meant as a QA tool.
There is still a need for a certain level of manual testing to be performed but unit testing is used to decrease the number of defects that make it to that stage. Unit testing tests the smallest parts of the system and if they all work the chances of the application as a whole working correctly are increased significantly.
It also assists when adding new features since regression testing can be performed quickly and automatically.
For a complex enough application, testing the entire output as a whole may not cover enough different possibilities. For example, any given application has a huge number of different code paths that can be followed depending on input. In typical testing, there may be many parts of your code that are simply never encountered, because they are only used in certain circumstances, so you can't be sure that any code that isn't run in your test situation, actually works. Also, errors in one section of code may be masked a majority of the time by something else in another section of code, so you may never discover some errors.
It is better to test each function or class separately. That way, the test is easier to write, because you are only testing a certain small section of the code. It's also easier to cover every possible code path when testing, and if you test each small part separately then you can detect errors even when those errors would often be masked by other parts of your code when run in your application.
Do yourself a favor and try out unit testing first. I was quite the skeptic myself until I realized just how darned helpful/powerful unit-tests can be. If you think about it, they aren't really there to add to your workload. They are there to provide you with peace of mind and allow you to continue extending your application while ensuring that your code is solid. You get immediate feedback as to when you may have broke something and this is something of extraordinary value.
To your question regarding why to test small sections of code consider this: Suppose your giant app uses a cool XOR encryption scheme that you wrote and eventually product management changes the requirements of how you generate these encrypted strings. So you say: "Heck, I wrote the the encryption routine so I'll go ahead and make the change. It'll take me 15 minutes and we'll all go home and have a party." Well, perhaps you introduced a bug during this process. But wait!!! Your handy dandy TestXOREncryption() test method immediately tells you that the expected output did not match the input. Bingo, this is why you broke down your unit tests ahead of time into small "units" to test for because in your big giant application you would not have figured this out nearly as fast.
Also, once you get into the frame of mind of regularly writing unit tests you'll realize that although you pay an upfront cost in the beginning in terms of time, you'll get that back 10 fold later in the development cycle when you can quickly identify areas in your code that have introduced problems.
There is no magic bullet with unit tests because your ability to identify problems is only as good as the tests you write. It boils down to delivering a better product and relieving yourself of stress and headaches. =)
Agree with most of the answers. Let's drill down on the topic of speed. Here are some real numbers:
Unit test results in 1 or 2 minutes from a
fresh compile. As true unit tests
(no interaction with external
systems like dbs) they can cover a
lot of logic really fast.
Automated functional test results in 1 or 2 hours. These run on a simplified platform, but sometimes cover multiple systems and the database - which really kills the speed.
Automated integration test results once a day. These exercise the full meal deal, but are so heavy and slow, we can only execute them once a day and it takes a few hours.
Manual regression results come in after a few weeks. We get stuff over to testers a few times a day, but your change isn't realistically regressed for week or two at best.
I want to find out what I broke in 1 or 2 minutes, not a few weeks, not even a few hours. That's where the 10fold ROI on unit tests that people talk about comes from.
This is a tough question to approach because it questions something of such enormous breadth. Here's my short answer, however:
Test Driven Development (or TDD) seeks to prove that every logical unit of an application (or block of code) functions exactly as it should. By making tests as automated as possible for productivity's sake, how could this really be harmful?
By testing every logical piece of code, you can trust the usage of the code up some hierarchy. Say I build an application that relies on a thread-safe stack implementation. Shouldn't the stack be guaranteed to work up at every stage before I build on it?
The key is that if something in the whole application breaks, meaning just looking at the total output/outcome, how do you know where it came from? Well, debugging, of course! Which puts you back where you started. TDD allows you to -hopefully- bypass this most painful stage in development.
Testers generally test end to end functionality. Obviously this is geared for going at user scenarios and has incredible value.
Unit Tests serve a different functionality. The are the developers way of verifying the components they write work correctly in the absence of other features or in combination with other features. This offers a range of value including
Provides un-ignorable documentation
Ability to isolate bugs to specific components
Verify invariants in the code
Provide quick, immediate feedback to changes in the code base.
One place to start is regression testing. Once you find a bug, write a small test that demonstrates the bug, fix it, then make sure the test now passes. In future you can run that test before each release to ensure that the bug has not been reintroduced.
Why do that at a unit level instead of a whole-program level? Speed. In good code it's much faster to isolate a small unit and write a tiny test than to drive a complex program through to the bug point. Then when testing a unit test will generally run significantly faster than an integration test.
Very simply: Unit tests are easier to write, since you're only testing a single method's functionality. And bugs are easier to fix, since you know exactly what method is broken.
But like the other answerers have pointed out, unit tests aren't the end-all-be-all of testing. They're just the smallest piece of the equation.
Probably the single biggest difficulty with software is the sheer number of interacting things, and the most useful technique is to reduce the number of things that have to be considered.
For example, using higher-level languages rather than lower-level improves productivity, because one line is a separate thing, and being able to write a program in fewer lines reduces the number of things.
Procedural programming came about as an attempt to reduce complexity by making it possible to treat a function as a thing. In order to do that, though, we have to be able to think about what the function does in a coherent manner, and with confidence that we're right. (Object-oriented programming does a similar thing, on a larger scale.)
There are several ways to do this. Design-by-contract is a way of exactly specifying what the function does. Using function parameters rather than global variables to call the function and get results reduces the complexity of the function.
Unit testing is one way to verify that the function does what it is supposed to. It's usually possible to test all the code in a function, and sometimes all the execution paths. It is a way to tell if the function works as it should or not. If the function works, we can think about it as a single thing, rather than as multiple things we have to keep track of.
It serves other purposes. Unit tests are usually quick to run, and so can catch bugs quickly, when they're easy to fix. If developers make sure a function passes the tests before being checked in, then the tests are a form of documenting what the function does that is guaranteed correct. The act of creating the tests forces the test writer to think about what the function should be doing. After that, whoever wanted the change can look at the tests to see if he or she was properly understood.
By way of contrast, larger tests are not exhaustive, and so can easily miss lots of bugs. They're bad at localizing bugs. They are usually performed at fairly long intervals, so they may detect a bug some time after it's made. They define parts of the total user experience, but provide no basis to reason about any part of the system. They should not be neglected, but they are not a substitute for unit tests.
As others have stated, the length of the feedback loop and isolation of the problem to a specific component are key benefits of Unit Tests.
Another way that they are complementary to functional tests is how coverage is tracked in some organizations:
Unit tests on code coverage
Functional tests on requirements coverage
Functional tests might miss features that were implemented but are not in the spec.
Being based on the code, Unit tests might miss that a certain feature wasn't implemented, which is where requirements based coverage analysis of Functional testing comes in.
A final point : there are some things that are easier/faster to test at the unit level, especially around error scenarios.
Unit testing will help you identify the source of your bug more clearly and let you know that you have a problem earlier. Both are good to have, but they are different, and unit testing does have benefits.
The software you test is a system. When you are testing it as a whole you are black box testing since you primarily deal with inputs and outputs. Black box testing is great when you have no means of getting inside of the system.
But since you usually do, you create a lot of unit tests that actually test your system as a white box. You can slice system open in many ways and organize your tests depending on system internal structure. White box testing provides you with many more ways of testing and analyzing systems. It's clearly complimentary to Black box testing and should not be considered as an alternative or competing methodology.