Pitfalls of code coverage [closed]

Pitfalls of code coverage [closed] - unit-testing

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking for real world examples of some bad side effects of code coverage.
I noticed this happening at work recently because of a policy to achieve 100% code coverage. Code quality has been improving for sure but conversely the testers seem to be writing more lax test plans because 'well the code is fully unit tested'. Some logical bugs managed to slip through as a result. They were a REALLY BIG PAIN to debug because 'well the code is fully unit tested'.
I think that was partly because our tool did statement coverage only. Still, it could have been time better spent.
If anyone has other negative side effects of having a code coverage policy please share. I'd like to know what kind of other 'problems' are happening out there in the real-world.
Thanks in advance.
EDIT: Thanks for all the really good responses. There are a few which I would mark as the answer but I can only mark one unfortunately.

In a sentence: Code coverage tells you what you definitely haven't tested, not what you have.
Part of building a valuable unit test suite is finding the most important, high-risk code and asking hard questions of it. You want to make sure the tough stuff works as a priority. Coverage figures have no notion of the 'importance' of code, nor the quality of tests.
In my experience, many of the most important tests you will ever write are the tests that barely add any coverage at all (edge cases that add a few extra % here and there, but find loads of bugs).
The problem with setting hard and (potentially counter-productive) coverage targets is that developers may have to start bending over backwards to test their code. There's making code testable, and then there's just torture. If you hit 100% coverage with great tests then that's fantastic, but in most situations the extra effort is just not worth it.
Furthermore, people start obsessing/fiddling with numbers rather than focussing on the quality of the tests. I've seen badly written tests that have 90+% coverage, just as I've seen excellent tests that only have 60-70% coverage.
Again, I tend to look at coverage as an indicator of what definitely hasn't been tested.

Just because there's code coverage doesn't mean you're actually testing all paths through the function.
For example, this code has four paths:
if (A) { ... } else { ... }
if (B) { ... } else { ... }
However just two tests (e.g. one with A and B true, one with A and B false) would give "100% code coverage."
This is a problem because the tendency is to stop testing once you've achieved the magic 100% number.

In my experience, the biggest problem with code coverage tools is the risk that somebody will fall victim to the belief that "high code coverage" equals "good testing." Most coverage tools just offer statement coverage metrics, as opposed to condition, data path or decision coverage. That means that it's possible to get 100% coverage on a bit of code like this:
for (int i = 0; i < MAX_RETRIES; ++i) {
if (someFunction() == MAGIC_NUMBER) {
break;
}
}
... without ever testing the termination condition on the for loop.
Worse, it's possible to get very high "coverage" from a test that simply invokes your application, without bothering to validate the output, or validating it incorrectly.
Simply put, low code coverage levels is certainly an indication of insufficient testing, but high coverage levels are not an indication of sufficient or correct testing.

Sometimes corner cases are so rare they're not worth testing, yet a strict code-coverage rule requires you test it anyway.
For example, in Java the MD5 algorithm is built-in, but technically it's possible that an "unsupported algorithm" type exception is thrown. It's never thrown and your test would have to go through significant gyrations to test that path.
It would be a lot of work wasted.

In my opinion, the greatest danger a team runs from measuring code coverage is that it rewards large tests, and penalizes small ones. If you have the choice between writing a single test that covers a large portion of your application's functionality, and writing ten small tests which test a single method, only measuring code coverage implies that you should write the large test.
However, writing the set of 10 small tests will give you much less brittle tests, and will test your application much more thoroughly than the one large test will. Thus, by measuring code coverage, particularly in an organization with still evolving testing habits, you can often set up the wrong incentives.

I know this isn't a direct answer to your question, but...
Any testing, regardless of what type, is insufficient by itself. Unit testing/code coverage is for developers. QA still needs to test the system as a whole. Business users still need to test the system as a whole as well.
The converse, QA tests the code completely, so developers shouldn't test is equally as bad. Testing is complimentary and different tests provide different things. Each test type can miss things that another might find.
Just like the rest of development, don't take shortcuts with testing, it'll only let bugs through.

Writing too targeted test cases.
Insufficient input variability testing of the Code
Large number of artificial test cases executed.
Not concentrating on the important test failures due to noise.
Difficulty in assigning defects because many conditions from many components must interact for a line to execute.
The worst side effect of having a 100% coverage goal is to spend a lot of the testing development cycle (75%+) hiting corner cases. Another poor effect of such a policy is the concentration of hitting a particular line of code rather than addressing the range of inputs. I don't really care that the strcpy function ran at least once. I really care that it ran against a wide variety of input. Having a policy is good. But having any extremely draconian policy is bad. The 100% metric of code coverage is neither necessary nor sufficient for code to be considered solid.

One of the largest pitfalls of code coverage is that people just talk about code coverage without actually specifying what type of code coverage they are talking about. The characteristics of C0, C1, C2 and even higher levels of code coverage are very different, so just talking about "code coverage" doesn't even make sense.
For example, achieving 100% full path coverage is pretty much impossible. If your program has n decision points, you need 2n tests (and depending on the definition, every single bit in a value is a decision point, so to achieve 100% full path coverage for an extremely simple function that just adds two ints, you need 18446744073709551616 tests). If you only have one loop, you already need infinitely many tests.
OTOH, achieving 100% C0 coverage is trivial.
Another important thing to remember, is that code coverage does not tell you what code was tested. It only tells you what code was run! You can try it out yourself: take a codebase that has 100% code coverage. Remove all the assertions from the tests. Now the codebase still has 100% coverage, but does not test a single thing! So, code coverage does not tell you what's tested, only what's not tested.

There are tools out there, Jumble for one, that perform analysis through branch coverage, by mutating your code to see if your test fails for all different permutations.
Directly from their website:
Jumble is a class level mutation
testing tool that works in conjunction
with JUnit. The purpose of mutation
testing is to provide a measure of the
effectiveness of test cases. A single
mutation is performed on the code to
be tested, the corresponding test
cases are then executed. If the
modified code fails the tests, then
this increases confidence in the
tests. Conversely, if the modified
code passes the tests this indicates a
testing deficiency.

Nothing wrong with code coverage - what I see wrong is the 100% figure. At some point the law of diminished returns kicks in and it becomes more expensive to test the last 1% than the other 99%. Code coverage is a worthy goal but common sense goes a long way.

!00% code coverage means well tested code is a complete myth. As developers we know the hard/complex/delicate parts of a system, and I would much rather see those areas properly tested, and only get 50% coverage, rather than the meaningless figure that every line has been run at least once.
In terms of a real world example, the only team that I was on that had 100% coverage wrote some of the worst code I've ever seen. 100% coverage was used to replace code review - the result was predicatably awful, to the extent that most code was thrown away, even though it passed the tests.

We have good tools for measuring code-coverage from unit tests. So it's tempting to rely on code-coverage of 100% to represent that you're "done testing." This is not true.
As other folks have mentioned, 100% code coverage doesn't prove that you have tested adequately, nor does 50% code coverage necessarily mean that you haven't tested adequately.
Measuring lines of code executed by tests is just one metric. You also have to test for a reasonable variety of function inputs, and also how the function or class behaves depending on some other external state. For example, some code functions differently based on the data in a database or in a file.
I've also blogged about this recently: http://karwin.blogspot.com/2009/02/unit-test-coverage.html

100% code coverage doesn't mean you're done with usnit tests
function int divide(int a, int b) {
return a/b;
}
With just 1 unit test, I get 100% code coverage for this function:
return divide(4,2) == 2;
Now, nobody would argue that this unit code with 100% coverage indicates that he feature works just fine.
I think code coverage is a good element to know if you are missing any obvious code path, but I would use it carefully.

Related

Is 100% code coverage a really good thing when doing unit tests? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I always learned that doing maximum code coverage with unit tests is good. I also hear developers from big companies such as Microsoft saying that they write more lines of testing code than the executable code itself.
Now, is it really great? Doesn't it seem sometimes like a complete loss of time which has an only effect to making maintenance more difficult?
For example, let's say I have a method DisplayBooks() which populates a list of books from a database. The product requirements tell that if there are more than one hundred books in the store, only one hundred must be displayed.
So, with TDD,
I will start by making an unit test BooksLimit() which will save two hundred books in the database, call DisplayBooks(), and do an Assert.AreEqual(100, DisplayedBooks.Count).
Then I will test if it fails,
Then I'll change DisplayBooks() by setting the limit of results to 100, and
Finally I will rerun the test to see if it succeeds.
Well, isn't it much more easier to go directly to the third step, and do never make BooksLimit() unit test at all? And isn't it more Agile, when requirements will change from 100 to 200 books limit, to change only one character, instead of changing tests, running tests to check if it fails, changing code and running tests again to check if it succeeds?
Note: lets assume that the code is fully documented. Otherwise, some may say, and they would be right, that doing full unit tests will help to understand code which lacks documentation. In fact, having a BooksLimit() unit test will show very clearly that there is a maximum number of books to display, and that this maximum number is 100. Stepping into the non-unit-tests code would be much more difficult, since such limit may be implemented though for (int bookIndex = 0; bookIndex < 100; ... or foreach ... if (count >= 100) break;.

Well, isn't it much more easier to go directly to the third step, and do never make BooksLimit() unit test at all?
Yes... If you don't spend any time writing tests, you'll spend less time writing tests. Your project might take longer overall, because you'll spend a lot of time debugging, but maybe that's easier to explain to your manager? If that's the case... get a new job! Testing is crucial to improving your confidence in your software.
Unittesting gives the most value when you have a lot of code. It's easy to debug a simple homework assignment using a few classes without unittesting. Once you get out in the world, and you're working in codebases of millions of lines - you're gonna need it. You simply can't single step your debugger through everything. You simply can't understand everything. You need to know that the classes you're depending on work. You need to know if someone says "I'm just gonna make this change to the behavior... because I need it", but they've forgotten that there's two hundred other uses that depend on that behavior. Unittesting helps prevent that.
With regard to making maintenance harder: NO WAY! I can't capitalize that enough.
If you're the only person that ever worked on your project, then yes, you might think that. But that's crazy talk! Try to get up to speed on a 30k line project without unittests. Try to add features that require significant changes to code without unittests. There's no confidence that you're not breaking implicit assumptions made by the other engineers. For a maintainer (or new developer on an existing project) unittests are key. I've leaned on unittests for documentation, for behavior, for assumptions, for telling me when I've broken something (that I thought was unrelated). Sometimes a poorly written API has poorly written tests and can be a nightmare to change, because the tests suck up all your time. Eventually you're going to want to refactor this code and fix that, but your users will thank you for that too - your API will be far easier to use because of it.
A note on coverage:
To me, it's not about 100% test coverage. 100% coverage doesn't find all the bugs, consider a function with two if statements:
// Will return a number less than or equal to 3
int Bar(bool cond1, bool cond2) {
int b;
if (cond1) {
b++;
} else {
b+=2;
}
if (cond2) {
b+=2;
} else {
b++;
}
}
Now consider I write a test that tests:
EXPECT_EQ(3, Bar(true, true));
EXPECT_EQ(3, Bar(false, false));
That's 100% coverage. That's also a function that doesn't meet the contract - Bar(false, true); fails, because it returns 4. So "complete coverage" is not the end goal.
Honestly, I would skip tests for BooksLimit(). It returns a constant, so it probably isn't worth the time to write them (and it should be tested when writing DisplayBooks()). I might be sad when someone decides to (incorrectly) calculate that limit from the shelf size, and it no longer satisfies our requirements. I've been burned by "not worth testing" before. Last year I wrote some code that I said to my coworker: "This class is mostly data, it doesn't need to be tested". It had a method. It had a bug. It went to production. It paged us in the middle of the night. I felt stupid. So I wrote the tests. And then I pondered long and hard about what code constitutes "not worth testing". There isn't much.
So, yes, you can skip some tests. 100% test coverage is great, but it doesn't magically mean your software is perfect. It all comes down to confidence in the face of change.
If I put class A, class B and class C together, and I find something that doesn't work, do I want to spend time debugging all three? No. I want to know that A and B already met their contracts (via unittests) and my new code in class C is probably broken. So I unittest it. How do I even know it's broken, if I don't unittest? By clicking some buttons and trying the new code? That's good, but not sufficient. Once your program scales up, it'll be impossible to rerun all your manual tests to check that everything works right. That's why people who unittest usually automate running their tests too. Tell me "Pass" or "Fail", don't tell me "the output is ...".
OK, gonna go write some more tests...

100% unit test coverage is generally a code smell, a sign that someone has come over all OCD over the green bar in the coverage tool, instead of doing something more useful.
Somewhere around 85% is the sweet spot, where a test failing more often that not indicates an actual or potential problem, rather than simply being an inevitable consequence of any textual change not inside comment markers. You are not documenting any useful assumptions about the code if your assumptions are 'the code is what it is, and if it was in any way different it would be something else'. That's a problem solved by a comment-aware checksum tool, not a unit test.
I wish there was some tool that would let you specify the target coverage. And then if you accidentally go over it, show things in yellow/orange/red to push you towards deleting some of the spurious extra tests.

When looking at an isolated problem, you're completely right. But unit tests are about covering all the intentions you have for a certain piece of code.
Basically, the unit tests formulate your intentions. With a growing number of intentions, the behavior of the code to be tested can always be checked against all intentions made so far. Whenever a change is made, you can prove that there is no side-effect which breaks existing intentions. Newly found bugs are nothing else but an (implicit) intention which is not held by the code, so that you formulate your intention as new test (which fails at first) and the fix it.
For one-time code, unit tests are indeed not worth the effort because no major changes are expected. However, for any block of code which is to be maintained or which serves as component for other code, warranting that all intentions are held for any new version is worth a lot (in terms of less effort for manually trying to check for side effects).
The tipping point where unit tests actually save you time and therefore money depends on the complexity of the code, but there always is a tipping point which usually is reached after only few iterations of changes. Also, last but not least it allows you to ship fixes and changes much faster without compromising the quality of your product.

There is no exlpicit relation between code coverage and good software. You can easily imagine piece of code that has 100%(or close) code coverage and it still contains a lot of bugs. (Which does not mean that tests are bad!)
Your question about agility of 'no test at all' approach is a good one only for short perspective (which means it is most likely not good if you plan to build your program for longer time). I know from my experience that such simple tests are very useful when your project gets bigger and bigger and at some stage you need to make significant changes. This can be a momment when you'll say to yourself 'It was a good decision to spend some extra minutes to write that tiny test that spotted bug I just introduced!".
I was a big fan of code coverage recently but now it turned (luckilly) to something like 'problems coverage' approach. It means that your tests should cover all problems and bugs that were spotted not just 'lines of code'. There is no need to do a 'code coverage race'.
I understand 'Agile' word in terms of number tests as 'number of tests that helps me build good software and not waste time to write unnecessary piece of code' rather than '100% coverage' or 'no tests at all'. It's very subjective and it based on your experience, team, technology and many others factors.
The psychological side effect of '100% code coverage' is that you may think that your code has no bugs, which never is true:)

Sorry for my english.
100% code coverage is a managerial physiological disorder to impress stakeholders artificially. We do testing because there is some complex code out there which can lead to defects. So we need to make sure that the complex code has a test case, its tested and the defects are fixed before its live.
We should aim at test something which is complex and not just everything. Now this complex needs to be expressed in terms of some metric number which can be ether cyclomatic complexity , lines of code , aggregations , coupling etc. or its probably culmination of all the above things. If we find that metric higher we need to ensure that , that part of the code is covered. Below is my article which covers what is the best % for code coverage.
Is 100% code coverage really needed ?

I agree with #soru, 100% test coverage is not a rational goal.
I do not believe that any tool or metric can exist that can tell you the "right" amount of coverage. When I was in grad school, my Thesis advisor's work was on designing test coverage for "mutated" code. He's take a suite of tests, and then run an automated program to make errors in the source code under test. The idea was that the mutated code contained errors that would be found in the real world, and thus a test suite that found the highest percentage of broken code was the winner.
While his thesis was accepted, and he is now a Professor at a major engineering school, he did not find either:
1) a magic number of test coverage that is optimal
2) any suite that could find 100% of the errors.
Note, the goal is to find 100% of the errors, not to find 100% coverage.
Whether #soru's 85% is right or not is a subject for discussion. I have no means to assess if a better number would be 80% or 90% or anything else. But as a working assessment, 85% feels about right to me.

First 100% is hard to get especially on big projects ! and even if you do when a block of code is covered it doesn't mean that it is doing what it is supposed to unless your test are asserting every possible input and output (which is Almost impossible).
So i wouldn't consider a piece of software to be good simply because it has 100% code coverage but code coverage still a good thing to have.
Well, isn't it much more easier to go directly to the third step, and do never make BooksLimit() unit test at all?
well having that test there makes you pretty confident that if someone changes the code and the test fails you will notice that something is wrong with the new code therefore you avoid any potencial bug in your application

When the client decides to change the limit to 200, good luck finding bugs related to that seemingly trivial test. Specially, when you have other 100 variables in your code, and there are other 5 developers working on code that relies on that tiny piece of information.
My point: if it's valuable to the business value (or, if you dislike the name, to the very important core of the project), test it. Only discard when there is no possible (or cheap) way of testing it, like UI or user interaction, or when you are sure the impact of not writing that test is minimal. This holds truer for projects with vague, or quickly changing requirements [as I painfully discovered].
For the other example you present, the recommended is to test boundary values. So you can limit your tests to only four values: 0, some magical number between 0 and BooksLimit, BooksLimit, and some number higher.
And, as other people said, make tests, but be 100% positive something else can fail.

Unit testing code coverage - do you have 100% coverage?

Do your unit tests constitute 100% code coverage? Yes or no, and why or why not.

No for several reasons :
It is really expensive to reach the 100% coverage, compared to the 90% or 95% for a benefit that is not obvious.
Even with 100% of coverage, your code is not perfect. Take a look at this method (in fact it depends on which type of coverage you are talking about - branch coverage, line coverage...):
public static String foo(boolean someCondition) {
String bar = null;
if (someCondition) {
bar = "blabla";
}
return bar.trim();
}
and the unit test:
assertEquals("blabla", foo(true));
The test will succeed, and your code coverage is 100%. However, if you add another test:
assertEquals("blabla", foo(false));
then you will get a NullPointerException. And as you were at 100% with the first test, you would have not necessarily write the second one!
Generally, I consider that the critical code must be covered at almost 100%, while the other code can be covered at 85-90%

To all the 90% coverage tester:
The problem with doing so is that the 10% hard to test code is also the not-trivial code that contains 90% of the bug! This is the conclusion I got empirically after many years of TDD.
And after all this is pretty straightforward conclusion. This 10% hard to test code, is hard to test because it reflect tricky business problem or tricky design flaw or both. These exact reasons that often leads to buggy code.
But also:
100% covered code that decreases with time to less than 100% covered often pinpoints a bug or at least a flaw.
100% covered code used in conjunction with contracts, is the ultimate weapon to lead to live close to bug-free code. Code Contracts and Automated Testing are pretty much the same thing
When a bug is discovered in 100% covered code, it is easier to fix. Since the code responsible for the bug is already covered by tests, it shouldn't be hard to write new tests to cover the bug fix.

No, because there is a practical trade-off between perfect unit tests and actually finishing a project :)

It is seldom practical to get 100% code coverage in a non-trivial system. Most developers who write unit tests shoot for the mid to high 90's.
An automated testing tool like Pex can help increase code coverage. It works by searching for hard-to-find edge cases.

Yes we do.
It depends on what language and framework you're using as to how easy that is to achieve though.
We're using Ruby on Rails for my current project. Ruby is very "mockable" in that you can stub/mock out large chunks of your code without having to build in overly complicated class composition and construction designs that you would have to do in other languages.
That said, we only have 100% line coverage (basically what rcov gives you). You still have to think about testing all the required branches.
This is only really possible if you include it from the start as part of your continuous integration build, and break the build if coverage drops below 100% - prompting developers to immediately fix it. Of course you could choose some other number as a target, but if you're starting fresh, there isn't much difference for the effort to get from 90% to 100%
We've also got a bunch of other metrics that break the build if they cross a given threshold as well (cyclomatic complexity, duplication for example) these all go together and help reinforce each other.
Again, you really have to have this stuff in place from the start to keep working at a strict level - either that or set some target you can hit, and gradually ratchet it up till you get to a level you're happy with.
Does doing this add value? I was skeptical at first, but I can honestly say that yes it does. Not primarily because you have thoroughly tested code (although that is definitely a benefit), but more in terms of writing simple code that is easy to test and reason about. If you know you have to have 100% test coverage, you stop writing overly complex if/else/while/try/catch monstrosities and Keep It Simple Stupid.

What I do when I get the chance is to insert statements on every branch of the code that can be grepped for and that record if they've been hit, so that I can do some sort of comparison to see which statements have not been hit. This is a bit of a chore, so I'm not always good about it.
I just built a small UI app to use in charity auctions, that uses MySQL as its DB. Since I really, really didn't want it to break in the middle of an auction, I tried something new.
Since it was in VC6 (C++ + MFC) I defined two macros:
#define TCOV ASSERT(FALSE)
#define _COV ASSERT(TRUE)
and then I sprinkled
TCOV;
throughout the code, on every separate path I could find, and in every routine.
Then I ran the program under the debugger, and every time it hit a TCOV, it would halt.
I would look at the code for any obvious problems, and then edit it to _COV, then continue. The code would recompile on the fly and move on to the next TCOV.
In this way, I slowly, laboriously, eliminated enough TCOV statements so it would run "normally".
After a while, I grepped the code for TCOV, and that showed what code I had not tested. Then I went back and ran it again, making sure to test more branches I had not tried earlier.
I kept doing this until there were no TCOV statements left in the code.
This took a few hours, but in the process I found and fixed several bugs. There is no way I could have had the discipline to make and follow a test plan that would have been that thorough.
Not only did I know I had covered all branches, but it had made me look at every branch while it was running - a very good kind of code review.
So, whether or not you use a coverage tool, this is a good way to root out bugs that would otherwise lurk in the code until a more embarrasing time.

I personally find 100% test coverage to be problematic on multiple levels. First and foremost, you have to make sure you are gaining a tangible, cost-saving benefit from the unit tests you write. In addition, unit tests, like any other code, are CODE. That means it, just like any other code, must be verified for correctness and maintained. That additional time verifying additional code for correctness, and maintaining it and keeping those tests valid in response to changes to business code, adds cost. Achieving 100% test coverage and ensuring you test you're code as thoroughly as possible is a laudable endeavor, but achieving it at any cost...well, is often too costly.
There are many times when covering error and validity checks that are in place to cover fringe or extremely rare, but definitely possible, exceptional cases are an example of code that does not necessarily need to be covered. The amount of time, effort (and ultimately money) that must be invested to achieve coverage of such rare fringe cases is often wasteful in light of other business needs. Properties are often a part of code that, especially with C# 3.0, do not need to be tested as most, if not all, properties behave exactly the same way, and are excessively simple (single-statement return or set.) Investing tremendous amounts of time wrapping unit tests around thousands of properties could quite likely be better invested somewhere else where a greater, more valuable tangible return on that investment can be realized.
Beyond simply achieving 100% test coverage, there are similar problems with trying to set up the "perfect" unit. Mocking frameworks have progressed to an amazing degree these days, and almost anything can be mocked (if you are willing to pay money, TypeMock can actually mock anything and everything, but it does cost a lot.) However, there are often times when dependencies of your code were not written in a mock-able way (this is actually a core problem with the vast bulk of the .NET framework itself.) Investing time to achieve the proper scope of a test is useful, but putting in excessive amounts of time to mock away everything and anything under the face of the sun, adding layers of abstraction and interfaces to make it possible, is again most often a waste of time, effort, and ultimately money.
The ultimate goal with testing shouldn't really be to achieve the ultimate in code coverage. The ultimate goal should be achieving the greatest value per unit time invested in writing unit tests, while covering as much as possible in that time. The best way to achieve this is to take the BDD approach: Specify your concerns, define your context, and verify the expected outcomes occur for any piece of behavior being developed (behavior...not unit.)

On a new project I practice TDD and maintain 100% line coverage. It mostly occurs naturally through TDD. Coverage gaps are usually worth the attention and are easily filled. If the coverage tool I'm using provided branch coverage or something else I'd pay attention to that, although I've never seen branch coverage tell me anything, probably because TDD got there first.
My strongest argument for maintaining 100% coverage (if you care about coverage at all) is that it's much easier to maintain 100% coverage than to manage less than 100% coverage. If you have 100% coverage and it drops, you immediately know why and can easily fix it, because the drop is in code you've just been working on. But if you settle for 95% or whatever, it's easy to miss coverage regressions and you're forever re-reviewing known gaps. It's the exact reason why current best practice requires one's test suite to pass completely. Anything less is harder, not easier, to manage.
My attitude is definitely bolstered by having worked in Ruby for some time, where there are excellent test frameworks and test doubles are easy. 100% coverage is also easy in Python. I might have to lower my standards in an environment with less amenable tools.
I would love to have the same standards on legacy projects, but I've never found it practical to bring a large application with mediocre coverage up to 100% coverage; I've had to settle for 95-99%. It's always been just too much work to go back and cover all the old code. This does not contradict my argument that it's easy to keep a codebase at 100%; it's much easier when you maintain that standard from the beginning.

No because I spent my time adding new features that help the users rather than tricky to write obscure tests that deliver little value. I say unit test the big things, subtle things and things that are fragile.

I generally write unit tests just as a regression-prevention method. When a bug is reported that I have to fix, I create a unit test to ensure that it doesn't re-surface in the future. I may create a few tests for sections of functionality I have to make sure stay intact (or for complex inter-part interactions), but I usually want for the bug fix to tell me one is necessary.

I usually manage to hit 93..100% with my coverage but I don't aim for 100% anymore. I used to do that and while it's doable, it's not worth the effort beyond a certain point because testing blindly obvious usually isn't needed. Good example of this could be the true evaluation branch of the following code snipped
public void method(boolean someBoolean) {
if (someBoolean) {
return;
} else {
/* do lots of stuff */
}
}
However what's important to achieve is to as close to 100% coverage on functional parts of the class as possible since those are the dangerous waters of your application, the misty bog of creeping bugs and undefined behaviour and of course the money-making flea circus.

From Ted Neward blog.
By this point in time, most developers have at least heard of, if not considered adoption of, the Masochistic Testing meme. Fellow NFJS'ers Stuart Halloway and Justin Gehtland have founded a consultancy firm, Relevance, that sets a high bar as a corporate cultural standard: 100% test coverage of their code.
Neal Ford has reported that ThoughtWorks makes similar statements, though it's my understanding that clients sometimes put accidental obstacles in their way of achieving said goal. It's ambitious, but as the ancient American Indian proverb is said to state,
If you aim your arrow at the sun, it will fly higher and farther than if you aim it at the ground.

In many cases it's not worth getting 100% statement coverage, but in some cases, it is worth it. In some cases 100% statement coverage is far too lax a requirement.
The key question to ask is, "what's the impact if the software fails (produces the wrong result)?". In most cases, the impact of a bug is relatively low. For example, maybe you have to go fix the code within a few days and rerun something. However, if the impact is "someone might die in 120 seconds", then that's a huge impact, and you should have a lot more test coverage than just 100% statement coverage.
I lead the Core Infrastructure Initiative Best Practices Badge for the Linux Foundation. We do have 100% statement coverage, but I wouldn't say it was strictly necessary. For a long time we were very close to 100%, and just decided to do that last little percent. We couldn't really justify the last few percent on engineering grounds, though; those last few percent were added purely as "pride of workmanship". I do get a very small extra piece of mind from having 100% coverage, but really it wasn't needed. We were over 90% statement coverage just from normal tests, and that was fine for our purposes. That said, we want the software to be rock-solid, and having 100% statement coverage has helped us get there. It's also easier to get 100% statement coverage today.
It's still useful to measure coverage, even if you don't need 100%. If your tests don't have decent coverage, you should be concerned. A bad test suite can have good statement coverage, but if you don't have good statement coverage, then by definition you have a bad test suite. How much you need is a trade-off: what are the risks (probability and impact) from the software that is totally untested? By definition it's more likely to have errors (you didn't test it!), but if you and your users can live with those risks (probability and impact), it's okay. For many lower-impact projects, I think 80%-90% statement coverage is okay, with better being better.
On the other hand, if people might die from errors in your software, then 100% statement coverage isn't enough. I would at least add branch coverage, and maybe more, to check on the quality of your tests. Standards like DO-178C (for airborne systems) take this approach - if a failure is minor, no big deal, but if a failure could be catastrophic, then much more rigorous testing is required. For example, DO-178C requires MC/DC coverage for the most critical software (the software that can quickly kill people if it makes a mistake). MC/DC is way more strenuous than statement coverage or even branch coverage.

I only have 100% coverage on new pieces of code that have been written with testability in mind. With proper encapsulation, each class and function can have functional unit tests that simultaneously give close to 100% coverage. It's then just a matter of adding some additional tests that cover some edge cases to get you to 100%.
You shouldn't write tests just to get coverage. You should be writing functional tests that test correctness/compliance. By a good functional specification that covers all grounds and a good software design, you can get good coverage for free.

Yes, I have had projects that have had 100% line coverage. See my answer to a similar question.
You can get 100% line coverage, but as others have pointed out here on SO and elsewhere on the internet its maybe only a minimum. When you consider path and branch coverage, there's a lot more work to do.
The other way of looking at it is to try to make your code so simple that its easy to get 100% line coverage.

There's a lot of good information here, I just wanted to add a few more benefits that I've found when aiming for 100% code coverage in the past
It helps reduce code complexity
Since it is easier to remove a line than it is to write a test case, aiming for 100% coverage forces you to justify every line, every branch, every if statement, often leading you to discover a much simpler way to do things that requires fewer tests
It helps develop good test granularity
You can achieve high test coverage by writing lots of small tests testing tiny bits of implementation as you go. This can be useful for tricky bits of logic but doing it for every piece of code no matter how trivial can be tedious, slow you down and become a real maintenance burden also making your code harder to refactor. On the other hand, it is very hard to achieve good test coverage with very high level end to end behavioural tests because typically the thing you are testing involves many components interacting in complicated ways and the permutations of possible cases become very large very quickly. Therefore if you are practical and also want to aim for 100% test coverage, you quickly learn to find a level of granularity for your tests where you can achieve a high level of coverage with a few good tests. You can achieve this by testing components at a level where they are simple enough that you can reasonably cover all the edge cases but also complicated enough that you can test meaningful behaviour. Such tests end up being simple, meaningful and useful for identifying and fixing bugs. I think this is a good skill and improves code quality and maintainability.

A while ago I did a little analysis of coverage in the JUnit implementation, code written and tested by, among others, Kent Beck and David Saff.
From the conclusions:
Applying line coverage to one of the best tested projects in the world, here is what we learned:
Carefully analyzing coverage of code affected by your pull request is more useful than monitoring overall coverage trends against thresholds.
It may be OK to lower your testing standards for deprecated code, but do not let this affect the rest of the code. If you use coverage thresholds on a continuous integration server, consider setting them differently for deprecated code.
There is no reason to have methods with more than 2-3 untested lines of code.
The usual suspects (simple code, dead code, bad weather behavior, …) correspond to around 5% of uncovered code.
In summary, should you monitor line coverage? Not all development teams do, and even in the JUnit project it does not seem to be a standard practice. However, if you want to be as good as the JUnit developers, there is no reason why your line coverage would be below 95%. And monitoring coverage is a simple first step to verify just that.

How do you measure the quality of your unit tests?

If you (or your organization) aspires to thoroughly unit test your code, how do you measure the success or quality of your efforts?
Do you use code coverage, what percentage do you aim for?
Do you find that philosophies like TDD have a better impact than metrics?

My tip is not a way to determine whether you have good unit tests per se, but it's a way to grow a good test suite over time.
Whenever you encounter a bug, either in your development or reported by someone else, fix it twice. You first create a unit test that reproduces the problem. When you've got a failing test, then you go and fix the problem.
If a problem was there in the first place it's a hint about a subtlety about the code or the domain. Adding a test for it lets you make sure it's never going to be reintroduced in the future.
Another interesting aspect about this approach is that it'll help you understand the problem from a higher level before you actually go and look at the intricacies of the code.
Also, +1 for the value and pitfalls of test coverage already mentioned by others.

Code coverage is a useful metric but should be used carefully. Some people take code coverage, specially the percentage covered, a bit too seriously and see it as THE metric for good unit testing.
My experience tells me that more important than trying to get 100% coverage, which is not that easy, people should focus on checking the critical sections are covered. But even then you may get false positives.

I am very much pro-TDD, but I don't place much importance in coverage stats. To me, the success and usefulness of unit tests is felt over a period of development time by the development team, as the tests (a) uncover bugs up front, (b) enable refactoring and change without regression, (c) help flesh out modular, decoupled design, (d) and whatnot.
Or, as Martin Fowler put it, the anecdotal evidence in support of unit tests and TDD is overwhelming, but you cannot measure productivity. Read more on his bliki here: http://www.martinfowler.com/bliki/CannotMeasureProductivity.html

To attain a full measure of confidence in your code you need different levels of testing: unit, integration and functional. I agree with the advice given above that states that testing should be automated (continuous integration) and that unit testing should cover all branches with a variety of edge case datasets. Code coverage tools (e.g. Cobertura, Clover, EMMA etc) can identify holes in your branches, but not in the quality of your test datasets. Static code analysis such as FindBugs, PMD, CPD can identify problem areas in your code before they become an issue and go a long way towards promoting better development practices.
Testing should attempt to replicate the overall environment that the application will be running in as much as possible. It should start from the simplest possible case (unit) to the most complex (functional). In the case of a web application, getting an automated process to run through all the use cases of your website with a variety of browsers is a must so something like SeleniumRC should be in your toolkit.
However, software exists to meet a business need so there is also testing against requirements. This tends to be more of a manual process based on functional (web) tests. Essentially, you'll need to build a traceability matrix against each requirement in the specification and the corresponding functional test. As functional tests are created they are matched up against one or more requirements (e.g. Login as Fred, update account details for password, logout again). This addresses the issue of whether or not the deliverable matches the needs of the business.
Overall, I would advocate a test driven development approach based on some flavour of automated unit testing (JUnit, nUnit etc). For integration testing I would recommend having a test database that is automatically populated at each build with a known dataset that illustrates common use cases but allows for other tests to build on. For functional testing you'll need some kind of user interface robot (SeleniumRC for web, Abbot for Swing etc). Metrics about each can easily be gathered during the build process and displayed on the CI server (eg Hudson) for all developers to see.

If it can break, it should be tested. If it can be tested, it should be automated.

If your primary way of measuring test quality is some automated metric, you've already failed.
Metrics can be misleading, and they can be gamed. And if the metric is the primary (or worse yet, only) means of judging quality they will be gamed (perhaps unintentionally).
Code coverage, for example, is deeply misleading because 100% code coverage is nowhere near complete test coverage. Also, a figure like "80% code coverage" is just as misleading without context. If that coverage is in the most complex bits of code and just misses the code which is so simple it's easy to verify by eye then that's significantly better than if that coverage is biased in the reverse way.
Also, it's important to distinguish between the test-domain of a test (it's feature-set, essentially) and its quality. Test quality is not determined by how much it tests just as code quality isn't determined by a laundry list of features. Test quality is determined by how well a test does its job in testing. That's actually very difficult to sum up in an automated metric.
The next time you go to write a unit test, try this experiment. See how many different ways you can write it such that it has the same code coverage and tests the same code. See whether its possible to write a very poor test that meets these criteria and a very good test as well. I think you may be surprised at the results.
Ultimately there's no substitute for experience and judgment. A human eye, hopefully several eyes, needs to look at the test code and decide if it's good or not.

Code coverage is to testing as testing is to programming. It can only tell you when there is a problem, it can't tell you when everything works. You should have 100% code coverage and beyond. Branches of code logic should be tested with several input values, fully exercising normal, edge, and corner cases.

I normally do TDD, so I write the tests first, which helps me see how I want to be able to use the objects.
Then, when I'm writing the classes, for the most part I can spot common pitfalls (i.e. assumptions that I'm making, e.g. a variable being of a particular type, or range of values) and when these come up I write a specific test for that specific case.
Aside from that, and getting as good as code coverage as possible (sometimes it's not possible to get 100%), you're more or less done. Then, if any bugs do come up in the future, you just make sure you write a test case for it that exposes it first, and will pass when fixed. Then fix as per normal.

Monitoring code coverage rates can be useful but instead of focusing on an arbitrary target rate (80 %, 90 %, 100 % ?) I have found it useful to aim for a positive trend over time.

I think some best practices for unit tests are:
They must be self-contained, i.e. not require too much configuration and external dependencies to run. Let tests build their own dependencies like files and Web sites required for the tests to run.
Use unit tests to reproduce bugs before fixing them. This helps prevent the bugs from surfacing again in the future.
Use a code coverage tool to spot critical code that is not exercised by any unit tests.
Integrate unit tests with nightly builds and release builds.
Publish test result reports and code coverage reports to a Web site where everyone in the team can browse them. The publishing should ideally be automated and integrated into the build system.
Do not expect to reach 100% code coverage unless you develop mission critical software. It can be very costly to reach this level and will for most projects not be worth the effort.

The concept of mutation testing seems promising as a way to measure (test?) the quality of your unit tests. Mutation testing basically means making small "mutations" to your production code and then seeing if any unit test fails. Small mutations typically means changing and to or or < to <=. If one ore more unit tests fail it means the "mutant" was caught. If the mutant survives your unit test suite it means you missed a test. When I apply mutation testing to code with 100% line and branch coverage it typically will find a few spots where I missed tests.
See https://en.wikipedia.org/wiki/Mutation_testing for a description of the concept and links to tools.

An additional technique I try to use is to partition your code into two parts. I've recently blogged about it here. The short description is to maintain your production code in two sets of libraries where one set (hopefully the larger set) has 100% line coverage (or better if you can measure it) and the other set (hopefully a tiny amount of code) has 0% coverage, yes zero percent coverage.
Your designs should allow this partitioning. This should make it easy to see the code that is not covered. Over time you may have ideas about how to move code from the smaller set to the larger set.

In addition to TDD, I found myself writing more sane tests since BDD (e.g. http://rspec.info/)
The everlasting discussion is always to mock or not to mock. Some mocks might become more complex than the code it is testing (which usually points to bad separation of concerns).
Therefore I like the idea of having a metric like: test complexity
per code complexity. or simplified: the number of test lines per lines of code.

Mutation Testing is the foolproof way, know more of it https://pitest.org/ and at https://github.com/vmzakharov/mutate-test-kata/blob/master/MutationTestKata.pdf

How is unit testing better than just testing the entire output of your application as a whole?

I don't understand how an unit test could possibly benefit.
Isn't it sufficient for a tester to test the entire output as a whole rather than doing unit tests?
Thanks.

What you are describing is integration testing. What integration testing will not tell you is which piece of your massive application is not working correctly when your output is no longer correct.
The advantage to unit testing is that you can write a test for each business assumption or algorithm step that you need your program to perform. When someone adds or changes code to your application, you immediately know exactly which step, which piece, and maybe even which line of code is broken when a bug is introduced. The time savings on maintenence for that reason alone makes it worthwhile, but there is an even bigger advantage in that regression bugs cannot be introduced (assuming your tests are running automatically when you build your software). If you fix a bug, and then write a test specifically to catch that bug in the future, there is no way someone could accidentally introduce it again.
The combination of integration testing and unit testing can let you sleep much easier at night, especially when you've checked in a big piece of code that day.

The earlier you catch bugs, the cheaper they are to fix. A bug found during unit testing by the coder is pretty cheap (just fix the darn thing).
A bug found during system or integration testing costs more, since you have to fix it and restart the test cycle.
A bug found by your customer will cost a lot: recoding, retesting, repackaging and so forth. It may also leave a painful boot print on your derriere when you inform management that you didn't catch it during unit testing because you didn't do any, thinking that the system testers would find all the problems :-)
How much money would it cost GM to recall 10,000 cars because the catalytic converter didn't work properly?
Now think of how much it would cost them if they discovered that immediately after those converters were delivered to them, but before they were put into those 10,000 cars.
I think you'll find the latter option to be quite a bit cheaper.
That's one reason why test driven development and continuous integration are (sometimes) a good thing - testing is done all the time.
In addition, unit tests don't check that the program works as a whole, just that each little bit performs as expected. That's often quite a lot more than higher level tests would check.

From my experience:
Integration and functional testing tend to be more indicative of the overall quality of the system, than unit test suit is.
High level testing (functional, acceptance) is a QA tool.
Unit testing is a development tool. Especially in a TDD context, where unit test becomes more of a design implement, rather than that of a quality assurance.
As a result of better design, quality of the entire system improves (indirectly).
Passing unit test suite is meant to ensure that a single component conforms to the developer's intentions (correctness). Acceptance test is the level that covers validity of the system (i.e. system does what user want it to do).
Summary:
Unit test is meant as a development tool first, QA tool second.
Acceptance test is meant as a QA tool.

There is still a need for a certain level of manual testing to be performed but unit testing is used to decrease the number of defects that make it to that stage. Unit testing tests the smallest parts of the system and if they all work the chances of the application as a whole working correctly are increased significantly.
It also assists when adding new features since regression testing can be performed quickly and automatically.

For a complex enough application, testing the entire output as a whole may not cover enough different possibilities. For example, any given application has a huge number of different code paths that can be followed depending on input. In typical testing, there may be many parts of your code that are simply never encountered, because they are only used in certain circumstances, so you can't be sure that any code that isn't run in your test situation, actually works. Also, errors in one section of code may be masked a majority of the time by something else in another section of code, so you may never discover some errors.
It is better to test each function or class separately. That way, the test is easier to write, because you are only testing a certain small section of the code. It's also easier to cover every possible code path when testing, and if you test each small part separately then you can detect errors even when those errors would often be masked by other parts of your code when run in your application.

Do yourself a favor and try out unit testing first. I was quite the skeptic myself until I realized just how darned helpful/powerful unit-tests can be. If you think about it, they aren't really there to add to your workload. They are there to provide you with peace of mind and allow you to continue extending your application while ensuring that your code is solid. You get immediate feedback as to when you may have broke something and this is something of extraordinary value.
To your question regarding why to test small sections of code consider this: Suppose your giant app uses a cool XOR encryption scheme that you wrote and eventually product management changes the requirements of how you generate these encrypted strings. So you say: "Heck, I wrote the the encryption routine so I'll go ahead and make the change. It'll take me 15 minutes and we'll all go home and have a party." Well, perhaps you introduced a bug during this process. But wait!!! Your handy dandy TestXOREncryption() test method immediately tells you that the expected output did not match the input. Bingo, this is why you broke down your unit tests ahead of time into small "units" to test for because in your big giant application you would not have figured this out nearly as fast.
Also, once you get into the frame of mind of regularly writing unit tests you'll realize that although you pay an upfront cost in the beginning in terms of time, you'll get that back 10 fold later in the development cycle when you can quickly identify areas in your code that have introduced problems.
There is no magic bullet with unit tests because your ability to identify problems is only as good as the tests you write. It boils down to delivering a better product and relieving yourself of stress and headaches. =)

Agree with most of the answers. Let's drill down on the topic of speed. Here are some real numbers:
Unit test results in 1 or 2 minutes from a
fresh compile. As true unit tests
(no interaction with external
systems like dbs) they can cover a
lot of logic really fast.
Automated functional test results in 1 or 2 hours. These run on a simplified platform, but sometimes cover multiple systems and the database - which really kills the speed.
Automated integration test results once a day. These exercise the full meal deal, but are so heavy and slow, we can only execute them once a day and it takes a few hours.
Manual regression results come in after a few weeks. We get stuff over to testers a few times a day, but your change isn't realistically regressed for week or two at best.
I want to find out what I broke in 1 or 2 minutes, not a few weeks, not even a few hours. That's where the 10fold ROI on unit tests that people talk about comes from.

This is a tough question to approach because it questions something of such enormous breadth. Here's my short answer, however:
Test Driven Development (or TDD) seeks to prove that every logical unit of an application (or block of code) functions exactly as it should. By making tests as automated as possible for productivity's sake, how could this really be harmful?
By testing every logical piece of code, you can trust the usage of the code up some hierarchy. Say I build an application that relies on a thread-safe stack implementation. Shouldn't the stack be guaranteed to work up at every stage before I build on it?
The key is that if something in the whole application breaks, meaning just looking at the total output/outcome, how do you know where it came from? Well, debugging, of course! Which puts you back where you started. TDD allows you to -hopefully- bypass this most painful stage in development.

Testers generally test end to end functionality. Obviously this is geared for going at user scenarios and has incredible value.
Unit Tests serve a different functionality. The are the developers way of verifying the components they write work correctly in the absence of other features or in combination with other features. This offers a range of value including
Provides un-ignorable documentation
Ability to isolate bugs to specific components
Verify invariants in the code
Provide quick, immediate feedback to changes in the code base.

One place to start is regression testing. Once you find a bug, write a small test that demonstrates the bug, fix it, then make sure the test now passes. In future you can run that test before each release to ensure that the bug has not been reintroduced.
Why do that at a unit level instead of a whole-program level? Speed. In good code it's much faster to isolate a small unit and write a tiny test than to drive a complex program through to the bug point. Then when testing a unit test will generally run significantly faster than an integration test.

Very simply: Unit tests are easier to write, since you're only testing a single method's functionality. And bugs are easier to fix, since you know exactly what method is broken.
But like the other answerers have pointed out, unit tests aren't the end-all-be-all of testing. They're just the smallest piece of the equation.

Probably the single biggest difficulty with software is the sheer number of interacting things, and the most useful technique is to reduce the number of things that have to be considered.
For example, using higher-level languages rather than lower-level improves productivity, because one line is a separate thing, and being able to write a program in fewer lines reduces the number of things.
Procedural programming came about as an attempt to reduce complexity by making it possible to treat a function as a thing. In order to do that, though, we have to be able to think about what the function does in a coherent manner, and with confidence that we're right. (Object-oriented programming does a similar thing, on a larger scale.)
There are several ways to do this. Design-by-contract is a way of exactly specifying what the function does. Using function parameters rather than global variables to call the function and get results reduces the complexity of the function.
Unit testing is one way to verify that the function does what it is supposed to. It's usually possible to test all the code in a function, and sometimes all the execution paths. It is a way to tell if the function works as it should or not. If the function works, we can think about it as a single thing, rather than as multiple things we have to keep track of.
It serves other purposes. Unit tests are usually quick to run, and so can catch bugs quickly, when they're easy to fix. If developers make sure a function passes the tests before being checked in, then the tests are a form of documenting what the function does that is guaranteed correct. The act of creating the tests forces the test writer to think about what the function should be doing. After that, whoever wanted the change can look at the tests to see if he or she was properly understood.
By way of contrast, larger tests are not exhaustive, and so can easily miss lots of bugs. They're bad at localizing bugs. They are usually performed at fairly long intervals, so they may detect a bug some time after it's made. They define parts of the total user experience, but provide no basis to reason about any part of the system. They should not be neglected, but they are not a substitute for unit tests.

As others have stated, the length of the feedback loop and isolation of the problem to a specific component are key benefits of Unit Tests.
Another way that they are complementary to functional tests is how coverage is tracked in some organizations:
Unit tests on code coverage
Functional tests on requirements coverage
Functional tests might miss features that were implemented but are not in the spec.
Being based on the code, Unit tests might miss that a certain feature wasn't implemented, which is where requirements based coverage analysis of Functional testing comes in.
A final point : there are some things that are easier/faster to test at the unit level, especially around error scenarios.

Unit testing will help you identify the source of your bug more clearly and let you know that you have a problem earlier. Both are good to have, but they are different, and unit testing does have benefits.

The software you test is a system. When you are testing it as a whole you are black box testing since you primarily deal with inputs and outputs. Black box testing is great when you have no means of getting inside of the system.
But since you usually do, you create a lot of unit tests that actually test your system as a white box. You can slice system open in many ways and organize your tests depending on system internal structure. White box testing provides you with many more ways of testing and analyzing systems. It's clearly complimentary to Black box testing and should not be considered as an alternative or competing methodology.

What is a reasonable code coverage % for unit tests (and why)? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
If you were to mandate a minimum percentage code-coverage for unit tests, perhaps even as a requirement for committing to a repository, what would it be?
Please explain how you arrived at your answer (since if all you did was pick a number, then I could have done that all by myself ;)

This prose by Alberto Savoia answers precisely that question (in a nicely entertaining manner at that!):
http://www.artima.com/forums/flat.jsp?forum=106&thread=204677
Testivus On Test Coverage
Early one morning, a programmer asked
the great master:
“I am ready to write some unit tests. What code coverage should I aim
for?”
The great master replied:
“Don’t worry about coverage, just write some good tests.”
The programmer smiled, bowed, and
left.
...
Later that day, a second programmer
asked the same question.
The great master pointed at a pot of
boiling water and said:
“How many grains of rice should I put in that pot?”
The programmer, looking puzzled,
replied:
“How can I possibly tell you? It depends on how many people you need to
feed, how hungry they are, what other
food you are serving, how much rice
you have available, and so on.”
“Exactly,” said the great master.
The second programmer smiled, bowed,
and left.
...
Toward the end of the day, a third
programmer came and asked the same
question about code coverage.
“Eighty percent and no less!” Replied the master in a stern voice,
pounding his fist on the table.
The third programmer smiled, bowed,
and left.
...
After this last reply, a young
apprentice approached the great
master:
“Great master, today I overheard you answer the same question about
code coverage with three different
answers. Why?”
The great master stood up from his
chair:
“Come get some fresh tea with me and let’s talk about it.”
After they filled their cups with
smoking hot green tea, the great
master began to answer:
“The first programmer is new and just getting started with testing.
Right now he has a lot of code and no
tests. He has a long way to go;
focusing on code coverage at this time
would be depressing and quite useless.
He’s better off just getting used to
writing and running some tests. He can
worry about coverage later.”
“The second programmer, on the other hand, is quite experience both
at programming and testing. When I
replied by asking her how many grains
of rice I should put in a pot, I
helped her realize that the amount of
testing necessary depends on a number
of factors, and she knows those
factors better than I do – it’s her
code after all. There is no single,
simple, answer, and she’s smart enough
to handle the truth and work with
that.”
“I see,” said the young apprentice,
“but if there is no single simple
answer, then why did you answer the
third programmer ‘Eighty percent and
no less’?”
The great master laughed so hard and
loud that his belly, evidence that he
drank more than just green tea,
flopped up and down.
“The third programmer wants only simple answers – even when there are
no simple answers … and then does not
follow them anyway.”
The young apprentice and the grizzled
great master finished drinking their
tea in contemplative silence.

Code Coverage is a misleading metric if 100% coverage is your goal (instead of 100% testing of all features).
You could get a 100% by hitting all the lines once. However you could still miss out testing a particular sequence (logical path) in which those lines are hit.
You could not get a 100% but still have tested all your 80%/freq used code-paths. Having tests that test every 'throw ExceptionTypeX' or similar defensive programming guard you've put in is a 'nice to have' not a 'must have'
So trust yourself or your developers to be thorough and cover every path through their code. Be pragmatic and don't chase the magical 100% coverage. If you TDD your code you should get a 90%+ coverage as a bonus. Use code-coverage to highlight chunks of code you have missed (shouldn't happen if you TDD though.. since you write code only to make a test pass. No code can exist without its partner test. )

Jon Limjap makes a good point - there is not a single number that is going to make sense as a standard for every project. There are projects that just don't need such a standard. Where the accepted answer falls short, in my opinion, is in describing how one might make that decision for a given project.
I will take a shot at doing so. I am not an expert in test engineering and would be happy to see a more informed answer.
When to set code coverage requirements
First, why would you want to impose such a standard in the first place? In general, when you want to introduce empirical confidence in your process. What do I mean by "empirical confidence"? Well, the real goal correctness. For most software, we can't possibly know this across all inputs, so we settle for saying that code is well-tested. This is more knowable, but is still a subjective standard: It will always be open to debate whether or not you have met it. Those debates are useful and should occur, but they also expose uncertainty.
Code coverage is an objective measurement: Once you see your coverage report, there is no ambiguity about whether standards have been met are useful. Does it prove correctness? Not at all, but it has a clear relationship to how well-tested the code is, which in turn is our best way to increase confidence in its correctness. Code coverage is a measurable approximation of immeasurable qualities we care about.
Some specific cases where having an empirical standard could add value:
To satisfy stakeholders. For many projects, there are various actors who have an interest in software quality who may not be involved in the day-to-day development of the software (managers, technical leads, etc.) Saying "we're going to write all the tests we really need" is not convincing: They either need to trust entirely, or verify with ongoing close oversight (assuming they even have the technical understanding to do so.) Providing measurable standards and explaining how they reasonably approximate actual goals is better.
To normalize team behavior. Stakeholders aside, if you are working on a team where multiple people are writing code and tests, there is room for ambiguity for what qualifies as "well-tested." Do all of your colleagues have the same idea of what level of testing is good enough? Probably not. How do you reconcile this? Find a metric you can all agree on and accept it as a reasonable approximation. This is especially (but not exclusively) useful in large teams, where leads may not have direct oversight over junior developers, for instance. Networks of trust matter as well, but without objective measurements, it is easy for group behavior to become inconsistent, even if everyone is acting in good faith.
To keep yourself honest. Even if you're the only developer and only stakeholder for your project, you might have certain qualities in mind for the software. Instead of making ongoing subjective assessments about how well-tested the software is (which takes work), you can use code coverage as a reasonable approximation, and let machines measure it for you.
Which metrics to use
Code coverage is not a single metric; there are several different ways of measuring coverage. Which one you might set a standard upon depends on what you're using that standard to satisfy.
I'll use two common metrics as examples of when you might use them to set standards:
Statement coverage: What percentage of statements have been executed during testing? Useful to get a sense of the physical coverage of your code: How much of the code that I have written have I actually tested?
This kind of coverage supports a weaker correctness argument, but is also easier to achieve. If you're just using code coverage to ensure that things get tested (and not as an indicator of test quality beyond that) then statement coverage is probably sufficient.
Branch coverage: When there is branching logic (e.g. an if), have both branches been evaluated? This gives a better sense of the logical coverage of your code: How many of the possible paths my code may take have I tested?
This kind of coverage is a much better indicator that a program has been tested across a comprehensive set of inputs. If you're using code coverage as your best empirical approximation for confidence in correctness, you should set standards based on branch coverage or similar.
There are many other metrics (line coverage is similar to statement coverage, but yields different numeric results for multi-line statements, for instance; conditional coverage and path coverage is similar to branch coverage, but reflect a more detailed view of the possible permutations of program execution you might encounter.)
What percentage to require
Finally, back to the original question: If you set code coverage standards, what should that number be?
Hopefully it's clear at this point that we're talking about an approximation to begin with, so any number we pick is going to be inherently approximate.
Some numbers that one might choose:
100%. You might choose this because you want to be sure everything is tested. This doesn't give you any insight into test quality, but does tell you that some test of some quality has touched every statement (or branch, etc.) Again, this comes back to degree of confidence: If your coverage is below 100%, you know some subset of your code is untested.
Some might argue that this is silly, and you should only test the parts of your code that are really important. I would argue that you should also only maintain the parts of your code that are really important. Code coverage can be improved by removing untested code, too.
99% (or 95%, other numbers in the high nineties.) Appropriate in cases where you want to convey a level of confidence similar to 100%, but leave yourself some margin to not worry about the occasional hard-to-test corner of code.
80%. I've seen this number in use a few times, and don't entirely know where it originates. I think it might be a weird misappropriation of the 80-20 rule; generally, the intent here is to show that most of your code is tested. (Yes, 51% would also be "most", but 80% is more reflective of what most people mean by most.) This is appropriate for middle-ground cases where "well-tested" is not a high priority (you don't want to waste effort on low-value tests), but is enough of a priority that you'd still like to have some standard in place.
I haven't seen numbers below 80% in practice, and have a hard time imagining a case where one would set them. The role of these standards is to increase confidence in correctness, and numbers below 80% aren't particularly confidence-inspiring. (Yes, this is subjective, but again, the idea is to make the subjective choice once when you set the standard, and then use an objective measurement going forward.)
Other notes
The above assumes that correctness is the goal. Code coverage is just information; it may be relevant to other goals. For instance, if you're concerned about maintainability, you probably care about loose coupling, which can be demonstrated by testability, which in turn can be measured (in certain fashions) by code coverage. So your code coverage standard provides an empirical basis for approximating the quality of "maintainability" as well.

Code coverage is great, but functionality coverage is even better. I don't believe in covering every single line I write. But I do believe in writing 100% test coverage of all the functionality I want to provide (even for the extra cool features I came with myself and which were not discussed during the meetings).
I don't care if I would have code which is not covered in tests, but I would care if I would refactor my code and end up having a different behaviour. Therefore, 100% functionality coverage is my only target.

My favorite code coverage is 100% with an asterisk. The asterisk comes because I prefer to use tools that allow me to mark certain lines as lines that "don't count". If I have covered 100% of the lines which "count", I am done.
The underlying process is:
I write my tests to exercise all the functionality and edge cases I can think of (usually working from the documentation).
I run the code coverage tools
I examine any lines or paths not covered and any that I consider not important or unreachable (due to defensive programming) I mark as not counting
I write new tests to cover the missing lines and improve the documentation if those edge cases are not mentioned.
This way if I and my collaborators add new code or change the tests in the future, there is a bright line to tell us if we missed something important - the coverage dropped below 100%. However, it also provides the flexibility to deal with different testing priorities.

I'd have another anectode on test coverage I'd like to share.
We have a huge project wherein, over twitter, I noted that, with 700 unit tests, we only have 20% code coverage.
Scott Hanselman replied with words of wisdom:
Is it the RIGHT 20%? Is it the 20%
that represents the code your users
hit the most? You might add 50 more
tests and only add 2%.
Again, it goes back to my Testivus on Code Coverage Answer. How much rice should you put in the pot? It depends.

Many shops don't value tests, so if you are above zero at least there is some appreciation of worth - so arguably non-zero isn't bad as many are still zero.
In the .Net world people often quote 80% as reasonble. But they say this at solution level. I prefer to measure at project level: 30% might be fine for UI project if you've got Selenium, etc or manual tests, 20% for the data layer project might be fine, but 95%+ might be quite achievable for the business rules layer, if not wholly necessary. So the overall coverage may be, say, 60%, but the critical business logic may be much higher.
I've also heard this: aspire to 100% and you'll hit 80%; but aspire to 80% and you'll hit 40%.
Bottom line: Apply the 80:20 rule, and let your app's bug count guide you.

For a well designed system, where unit tests have driven the development from the start i would say 85% is a quite low number. Small classes designed to be testable should not be hard to cover better than that.
It's easy to dismiss this question with something like:
Covered lines do not equal tested logic and one should not read too much into the percentage.
True, but there are some important points to be made about code coverage. In my experience this metric is actually quite useful, when used correctly. Having said that, I have not seen all systems and i'm sure there are tons of them where it's hard to see code coverage analysis adding any real value. Code can look so different and the scope of the available test framework can vary.
Also, my reasoning mainly concerns quite short test feedback loops. For the product that I'm developing the shortest feedback loop is quite flexible, covering everything from class tests to inter process signalling. Testing a deliverable sub-product typically takes 5 minutes and for such a short feedback loop it is indeed possible to use the test results (and specifically the code coverage metric that we are looking at here) to reject or accept commits in the repository.
When using the code coverage metric you should not just have a fixed (arbitrary) percentage which must be fulfilled. Doing this does not give you the real benefits of code coverage analysis in my opinion. Instead, define the following metrics:
Low Water Mark (LWM), the lowest number of uncovered lines ever seen in the system under test
High Water Mark (HWM), the highest code coverage percentage ever seen for the system under test
New code can only be added if we don't go above the LWM and we don't go below the HWM. In other words, code coverage is not allowed to decrease, and new code should be covered. Notice how i say should and not must (explained below).
But doesn't this mean that it will be impossible to clean away old well-tested rubbish that you have no use for anymore? Yes, and that's why you have to be pragmatic about these things. There are situations when the rules have to be broken, but for your typical day-to-day integration my experience it that these metrics are quite useful. They give the following two implications.
Testable code is promoted.
When adding new code you really have to make an effort to make the code testable, because you will have to try and cover all of it with your test cases. Testable code is usually a good thing.
Test coverage for legacy code is increasing over time.
When adding new code and not being able to cover it with a test case, one can try to cover some legacy code instead to get around the LWM rule. This sometimes necessary cheating at least gives the positive side effect that the coverage of legacy code will increase over time, making the seemingly strict enforcement of these rules quite pragmatic in practice.
And again, if the feedback loop is too long it might be completely unpractical to setup something like this in the integration process.
I would also like to mention two more general benefits of the code coverage metric.
Code coverage analysis is part of the dynamic code analysis (as opposed to the static one, i.e. Lint). Problems found during the dynamic code analysis (by tools such as the purify family, http://www-03.ibm.com/software/products/en/rational-purify-family) are things like uninitialized memory reads (UMR), memory leaks, etc. These problems can only be found if the code is covered by an executed test case. The code that is the hardest to cover in a test case is usually the abnormal cases in the system, but if you want the system to fail gracefully (i.e. error trace instead of crash) you might want to put some effort into covering the abnormal cases in the dynamic code analysis as well. With just a little bit of bad luck, a UMR can lead to a segfault or worse.
People take pride in keeping 100% for new code, and people discuss testing problems with a similar passion as other implementation problems. How can this function be written in a more testable manner? How would you go about trying to cover this abnormal case, etc.
And a negative, for completeness.
In a large project with many involved developers, everyone is not going to be a test-genius for sure. Some people tend to use the code coverage metric as proof that the code is tested and this is very far from the truth, as mentioned in many of the other answers to this question. It is ONE metric that can give you some nice benefits if used properly, but if it is misused it can in fact lead to bad testing. Aside from the very valuable side effects mentioned above a covered line only shows that the system under test can reach that line for some input data and that it can execute without hanging or crashing.

If this were a perfect world, 100% of code would be covered by unit tests. However, since this is NOT a perfect world, it's a matter of what you have time for. As a result, I recommend focusing less on a specific percentage, and focusing more on the critical areas. If your code is well-written (or at least a reasonable facsimile thereof) there should be several key points where APIs are exposed to other code.
Focus your testing efforts on these APIs. Make sure that the APIs are 1) well documented and 2) have test cases written that match the documentation. If the expected results don't match up with the docs, then you have a bug in either your code, documentation, or test cases. All of which are good to vet out.
Good luck!

Code coverage is just another metric. In and of itself, it can be very misleading (see www.thoughtworks.com/insights/blog/are-test-coverage-metrics-overrated). Your goal should therefore not be to achieve 100% code coverage but rather to ensure that you test all relevant scenarios of your application.

I prefer to do BDD, which uses a combination of automated acceptance tests, possibly other integration tests, and unit tests. The question for me is what the target coverage of the automated test suite as a whole should be.
That aside, the answer depends on your methodology, language and testing and coverage tools. When doing TDD in Ruby or Python it's not hard to maintain 100% coverage, and it's well worth doing so. It's much easier to manage 100% coverage than 90-something percent coverage. That is, it's much easier to fill coverage gaps as they appear (and when doing TDD well coverage gaps are rare and usually worth your time) than it is to manage a list of coverage gaps that you haven't gotten around to and miss coverage regressions due to your constant background of uncovered code.
The answer also depends on the history of your project. I've only found the above to be practical in projects managed that way from the start. I've greatly improved the coverage of large legacy projects, and it's been worth doing so, but I've never found it practical to go back and fill every coverage gap, because old untested code is not well understood enough to do so correctly and quickly.

85% would be a good starting place for checkin criteria.
I'd probably chose a variety of higher bars for shipping criteria - depending on the criticality of the subsystems/components being tested.

Code coverage is great but only as long as the benefits that you get from it outweigh the cost/effort of achieving it.
We have been working to a standard of 80% for some time, however we have just made the decison to abandon this and instead be more focused on our testing. Concentrating on the complex business logic etc,
This decision was taken due to the increasing amount of time we spent chasing code coverage and maintaining existing unit tests. We felt we had got to the point where the benefit we were getting from our code coverage was deemed to be less than the effort that we had to put in to achieve it.

I use cobertura, and whatever the percentage, I would recommend keeping the values in the cobertura-check task up-to-date. At the minimum, keep raising totallinerate and totalbranchrate to just below your current coverage, but never lower those values. Also tie in the Ant build failure property to this task. If the build fails because of lack of coverage, you know someone's added code but hasn't tested it. Example:
<cobertura-check linerate="0"
branchrate="0"
totallinerate="70"
totalbranchrate="90"
failureproperty="build.failed" />

When I think my code isn't unit tested enough, and I'm not sure what to test next, I use coverage to help me decide what to test next.
If I increase coverage in a unit test - I know this unit test worth something.
This goes for code that is not covered, 50% covered or 97% covered.

Short answer: 60-80%
Long answer:
I think it totally depends on the nature of your project. I typically start a project by unit testing every practical piece. By the first "release" of the project you should have a pretty good base percentage based on the type of programming you are doing. At that point you can start "enforcing" a minimum code coverage.

If you've been doing unit testing for a decent amount of time, I see no reason for it not to be approaching 95%+. However, at a minimum, I've always worked with 80%, even when new to testing.
This number should only include code written in the project (excludes frameworks, plugins, etc.) and maybe even exclude certain classes composed entirely of code written of calls to outside code. This sort of call should be mocked/stubbed.

Generally speaking, from the several engineering excellence best practices papers that I have read, 80% for new code in unit tests is the point that yields the best return. Going above that CC% yields a lower amount of defects for the amount of effort exerted. This is a best practice that is used by many major corporations.
Unfortunately, most of these results are internal to companies, so there are no public literatures that I can point you to.

My answer to this conundrum is to have 100% line coverage of the code you can test and 0% line coverage of the code you can't test.
My current practice in Python is to divide my .py modules into two folders: app1/ and app2/ and when running unit tests calculate the coverage of those two folders and visually check (I must automate this someday) that app1 has 100% coverage and app2 has 0% coverage.
When/if I find that these numbers differ from standard I investigage and alter the design of the code so that coverage conforms to the standard.
This does mean that I can recommend achieving 100% line coverage of library code.
I also occasionally review app2/ to see if I could possible test any code there, and If I can I move it into app1/
Now I'm not too worried about the aggregate coverage because that can vary wildly depending on the size of the project, but generally I've seen 70% to over 90%.
With python, I should be able to devise a smoke test which could automatically run my app while measuring coverage and hopefully gain an aggreagate of 100% when combining the smoke test with unittest figures.

Check out Crap4j. It's a slightly more sophisticated approach than straight code coverage. It combines code coverage measurements with complexity measurements, and then shows you what complex code isn't currently tested.

Viewing coverage from another perspective: Well-written code with a clear flow of control is the easiest to cover, the easiest to read, and usually the least buggy code. By writing code with clearness and coverability in mind, and by writing the unit tests in parallel with the code, you get the best results IMHO.

In my opinion, the answer is "It depends on how much time you have". I try to achieve 100% but I don't make a fuss if I don't get it with the time I have.
When I write unit tests, I wear a different hat compared to the hat I wear when developing production code. I think about what the tested code claims to do and what are the situations that can possible break it.
I usually follow the following criteria or rules:
That the Unit Test should be a form of documentation on what's the expected behavior of my codes, ie. the expected output given a certain input and the exceptions it may throw that clients may want to catch (What the users of my code should know?)
That the Unit Test should help me discover the what if conditions that I may not yet have thought of. (How to make my code stable and robust?)
If these two rules doesn't produce 100% coverage then so be it. But once, I have the time, I analyze the uncovered blocks and lines and determine if there are still test cases without unit tests or if the code needs to be refactored to eliminate the unecessary codes.

It depends greatly on your application. For example, some applications consist mostly of GUI code that cannot be unit tested.

I don't think there can be such a B/W rule.
Code should be reviewed, with particular attention to the critical details.
However, if it hasn't been tested, it has a bug!

Depending on the criticality of the code, anywhere from 75%-85% is a good rule of thumb.
Shipping code should definitely be tested more thoroughly than in house utilities, etc.

This has to be dependent on what phase of your application development lifecycle you are in.
If you've been at development for a while and have a lot of implemented code already and are just now realizing that you need to think about code coverage then you have to check your current coverage (if it exists) and then use that baseline to set milestones each sprint (or an average rise over a period of sprints), which means taking on code debt while continuing to deliver end user value (at least in my experience the end user doesn't care one bit if you've increased test coverage if they don't see new features).
Depending on your domain it's not unreasonable to shoot for 95%, but I'd have to say on average your going to be looking at an average case of 85% to 90%.

I think the best symptom of correct code coverage is that amount of concrete problems unit tests help to fix is reasonably corresponds to size of unit tests code you created.

I think that what may matter most is knowing what the coverage trend is over time and understanding the reasons for changes in the trend. Whether you view the changes in the trend as good or bad will depend upon your analysis of the reason.

We were targeting >80% till few days back, But after we used a lot of Generated code, We do not care for %age, but rather make reviewer take a call on the coverage required.

From the Testivus posting I think the answer context should be the second programmer.
Having said this from a practical point of view we need parameter / goals to strive for.
I consider that this can be "tested" in an Agile process by analyzing the code we have the architecture, functionality (user stories), and then come up with a number. Based on my experience in the Telecom area I would say that 60% is a good value to check.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js