How to deal with code coverage? - unit-testing

The other day we had a hard discussion between different developers and project leads, about code coverage tools and the use of the corresponding reports.
Do you use code coverage in your projects and if so, why not?
Is code coverage a fixed part of your builds or continous integration
or do you just use it from time to time?
How do you deal with the numbers derived from the reports?

We use code coverage to verify that we aren't missing big parts in our testing efforts. Once a milestone or so we run a full coverage report and spend a few days analyzing the results, adding test coverage for areas we missed.
We don't run it every build because I don't know that we would analyze it on a regular enough basis to justify that.
We analyze the reports for large blocks of unhit code. We've found this to be the most efficient use. In the past we would try to hit a particular code coverage target but after some point, the returns become very diminishing. Instead, it's better to use code coverage as a tool to make sure you didn't forget anything.

1) Yes we do use code coverage
2) Yes it is part of the CI build (why wouldn't it be?)
3) The important part - we don't look for 100% coverage. What we do look for is buggy/complex code, that's easy to find from your unit tests, and the Devs/Leads will know the delicate parts of the system. We make sure the coverage of such code areas is good, and increases with time, not decreases as people hack in more fixes without the requisite tests.

Code coverage tells you how big your "bug catching" net is, but it doesn't tell you how big the holes are in your net.
Use it as an indicator to gauge your testing efforts but not as an absolute metric.
It is possible to write code that will give you 100% coverage and does not test anything at all.

The way to look at Code Coverage is to see how much is NOT covered and find out why it is not covered. Code coverage simply tells us that the lines of code is being hit when the unit tests are running. It does not tell us that the code works correctly or not. 100% code coverage is a good number but in medium/large projects it is very hard to achieve.

I like to measure code coverage on any non-trivial project. As has been mentioned, try not to get too caught up in achieving an arbitrary/magical percentage. There are better metrics, such as riskiness based on complexity, coverage by package/namespace, etc.
Take a look at this sample Clover dashboard for similar ideas.

We do it in a build, and we see that it should not drop below some value, like 85%.
I also do automatic Top 10 Largest Not-covered methods, to know what to start covering.

Many teams switching to Agile/XP use code coverage as an indirect way of gauging the ROI of their test automation efforts.
I think of it as an experiment - there's an hypothesis that "if we start writing unit tests, our code coverage will improve" - and it makes sense to collect the corresponding observation automatically, via CI, report it in a graph etc.
You use the results to detect rough spots: if the trend toward more coverage levels off at some point, for instance, you might stop to ask what's going on. Perhaps the team has trouble writing tests that are relevant.

We use code coverage to assure that we have no major holes in our tests, and it's run nightly in our CI.
Since we also have a full set of selenium-web tests that run all the way through the stack we also do an additional coverage trick:
We set up the web-application with coverage running. Then we run the full automated test battery of selenium tests. Some of these are smoke tests only.
When the full suite of tests has been run, we can identify suspected dead code simply by looking at the coverage and inspecting code. This is really nice when working on large projects, because you can have big branches of dead code after some time.
We don't really have any fixed metrics on how often we do this, but it's all set up to run with a keypress or two.

We do use code coverage, it is integrated in our nightly build. There are several tools to analyze the coverage data, commonly they report
statement coverage
branch coverage
MC/DC coverage
We expect to reach + 90% statement and branch coverage. MC/DC coverage on the other hand gives broader sense for test team. For the uncovered code, we expect justification records by the way.

I find it depends on the code itself. I won't repeat Joel's statements from SO podcast #38, but the upshot is 'try to be pragmatic'.
Code coverage is great in core elements of the app.
I look at the code as a tree of dependency, if the leaves work (e.g. basic UI or code calling a unit tested DAL) and I've tested them when I've developed them, or updated them, there is a large chance they will work, and if there's a bug, then it won't be difficult to find or fix, so the time taken to mock up some tests will probably be time wasted. Yes there is an issue that updates to code they are dependent on may affect them, but again, it's a case by case thing, and unit tests for the code they are dependent on should cover it.
When it comes to the trunks or branch of the code, yes code coverage of functionality (as opposed to each function), is very important.
For example, I recently was on a team that built an app that required a bundle of calculations to calculate carbon emissions. I wrote a suite of tests that tested each and every calculation, and in doing so was happy to see that the dependency injection pattern was working fine.
Inevitably, due to a government act change, we had to add a parameter to the equations, and all 100+ tests broke.
I realised to update them, over and above testing for a typo (which I could test once), I was unit/regression testing mathematics, and ended up spending the time on building another area of the app instead.

1) Yes we do measure simple node coverage, beacause:
it is easy to do with our current project* (Rails web app)
it encourages our developers to write tests (some come from backgrounds where testing was ad-hoc)
2) Code coverage is part of our continuous integration process.
3) The numbers from the reports are used to:
enforce a minimum level of coverage (95% otherwise the build fails)
find sections of code which should be tested
There are parts of the system where testing is not all that helpful (usually where you need to make use of mock-objects to deal with external systems). But generally having good coverage makes it easier to maintain a project. One knows that fixes or new features do not break existing functionality.
*Details for setting up required coverage for Rails: Min Limit 95 Ahead

Related

How do I get SonarQube to analyse Code Smells in unit tests but not count them for coverage reports?

I have C++ project being analysed with the commercial SonarQube plugin.
My project has, in my mind, an artificially high code coverage percentage reported as both the "production" source code lines and Unit Test code lines are counted. It is quite difficult to write many unit test code lines that are not run as a part of the unit testing, so they give an immediate boost to the coverage reports.
Is it possible to have the Unit Test code still analysed for Code Smells but not have it count towards the test coverage metric?
I have tried setting the sonar.tests=./Tests parameter (where ./Tests is the directory with my test code. This seems to exclude the test code from all analysis, leaving smells undetected. I would rather check that the test code is of good quality than hope it is obeying the rules applied to the project.
I tried adding the sonar.test.inclusions=./Tests/* in combination with the above. However, I either got the file path syntax incorrect or setting this variable causes a complete omission of the Test code, so that it no longer appears under the 'Code' tab at all as well as being excluded.
The documentation on Narrowing the Focus of what is analysed is not all the clear on what the expected behaviour is, at least to me. Any help would be greatly appreciated as going through every permutation will be quite confusing.
Perhaps I should just accept the idea that with ~300 lines of "production" code and 900 lines of stubs, mocks and unit tests a value of 75% test coverage could mean running 0 lines of "production" code. I checked and currently, my very simple application is at about that ratio of test code to "production" code. I'd expect the ratio to move more towards 50:50 over time but it might not do.
One solution I found was to have two separate SonarQube projects for a single repository. The first you setup in the normal way, with the test code excluded via sonar.tests=./Tests. The second you make a -test repository where you exclude all your production code.
This adds some admin and setup but guarantees that coverage for the normal project is a percentage of only the production code and that you have SonarQube Analysis performed on all your test code (which can also have coverage tracked and would be expected to be very high).
I struggle to remember where I found this suggestion a long time ago. Possibly somewhere in the SonarQube Community Forum, which is worth a look if you are stuck on something.

Quantifying Unit Test Coverage

Our company is trying to enforce test driven development and as a development manager, I'm trying to define what that acceptance criteria really means. We're mostly follow an agile methodology and each story going to test needs some level of assurance (entrance criteria) of unit test coverage. I'm interested to hear how you guys enforce this (if you do) from a gating level effectively within your companies.
What you don't want is to set any code coverage requirements. Any requirement like that can and will be gamed.
Instead, I'd look at measuring RTF: Running, Tested Features. See http://xprogramming.com/articles/jatrtsmetric/
For our Ruby on Rails app, we use a code metric gem called SimpleCov. I am not sure what language your company uses, but I am sure there is a code metric for it. SimpleCov is great for Ruby, because it provides an extensive GUI, highlighting down to the line whether code was covered, skipped (filtered out), or missed.
We just started to track our code coverage for two months now. We began at 30%, and are now near 60%. Depending on the age of your company's application, you may want to raise your coverage expectations to 80% or higher... According to SimpleCov, anything 91% or higher is "in the green", and below 80% is "in the red" (for great color analogies).
I feel that the most important thing is to make sure you have your crucial features tested -- such features may have the most lines of code to be tested. Getting those done first will drastically increase coverage.
Another thing to note, if you use a library like SimpleCov, you may be able to skip (filter out) lines of code, or even entire files, that you feel are legacy and may lower your coverage. That is another reason why our coverage almost doubled in 2 months.
Again, we are new to measuring code coverage, but strongly believe in its benefit to our current testing suite and application development.

Determining which tests cover a line of code

Is there a way to determine the set of unit tests that will potentially execute a given line of code? In other words, can you automatically determine not just whether a given line is covered, but the actual set of tests that cover it?
Consider a big code base with, say, 50K unit tests. Clearly, it could take a LONG time to run them all--hours, if not days. Working in such a code base, you'd like to be able to execute some subset of all the unit tests, including only those that cover the line (or lines) that you just touched. Sure, you could find some manually and run those, but I'm looking for a way to do it faster, and more comprehensively.
If I'm thinking about this correctly, it should be possible. A tool could statically traverse all the code paths leading out of each unit test, and come up with a slice of the program reachable from that test. And you should then (theoretically) be able to compute the set of unit tests that include a given line in their slice, meaning that the line could be executed by that test ("could" rather than "will" because the actual code path will only be determined at run time based on the inputs or other conditions). A given line of code could have a massive number of tests that execute it (say, code in a shared library), whereas other lines might have few (or no) tests covering them.
So:
Is my reasoning sound on this idea? Could it theoretically be done, or is there something I'm leaving out?
Is there already a tool out there that can do this? Or, is this a common thing with a name I haven't run into? Pointers to tools in the java world, or to general research on the subject, would be appreciated.
JetBrains's dotCover also now has this feature for .NET code. It can be accessed from the dotCover menu with the option "Show covering tests" or by pressing Ctrl + Alt + K.
I'm pretty sure Clover will show you which tests validate each line of code. So you could manually execute the tests by looking at the coverage reports. They also have a new API which you might be able to use to write an IDE plugin that would let you execute the tests that cover a line of code.
The following presentation discusses how to compute the program slice executed by a unit test. It answers the question of, "can you determine the test coverage without executing the program?" and basically sketches the idea you described... with the additional bit of work to actually implement it.
You might note that computing a program slice isn't a computationally cheap task. I'd guess that computing a slice (a symbolic computation) is generally slower than executing a unit test, so I'm not sure that you'll save any time doing this. And a slice is a conservative approximation of the affected part of the program, so the answer you get back will include program parts that actually don't get executed.
You might be better off instead to run all those 50,000 unit tests once, and collect the coverage data for each one. In this case, when some code fragment is updated, it is possible to determine statically whether the code a particular test executes includes the code you changed or not, and thus you can identify tests that have to be executed again. You can skip executing the rest of the tests.
My company builds a family of test coverage tools. Our next release of these tools will have this kind of incremental regression testing capability.
This is a feature that the JMockit Coverage tool (for Java) provides, although it shows the tests that did cover a given line of production code in the last run, not the tests "that will potentially execute a given line of code".
Typically, however, you would have a Jenkins (or whatever) build of the project, where all tests are executed and an HTML coverage report is generated. Then it would just be a matter of examining the report to see which tests are currently covering a given line of code.
A sample coverage report showing the list of tests for each line of production code is available online.

What's the Point of Selenium?

Ok, maybe I'm missing something, but I really don't see the point of Selenium. What is the point of opening the browser using code, clicking buttons using code, and checking for text using code? I read the website and I see how in theory it would be good to automatically unit test your web applications, but in the end doesn't it just take much more time to write all this code rather than just clicking around and visually verifying things work?
I don't get it...
It allows you to write functional tests in your "unit" testing framework (the issue is the naming of the later).
When you are testing your application through the browser you are usually testing the system fully integrated. Consider you already have to test your changes before committing them (smoke tests), you don't want to test it manually over and over.
Something really nice, is that you can automate your smoke tests, and QA can augment those. Pretty effective, as it reduces duplication of efforts and gets the whole team closer.
Ps as any practice that you are using the first time it has a learning curve, so it usually takes longer the first times. I also suggest you look at the Page Object pattern, it helps on keeping the tests clean.
Update 1: Notice that the tests will also run javascript on the pages, which helps testing highly dynamic pages. Also note that you can run it with different browsers, so you can check cross-browser issues(at least on the functional side, as you still need to check the visual).
Also note that as the amount of pages covered by tests builds up, you can create tests with complete cycles of interactions quickly. Using the Page Object pattern they look like:
LastPage aPage = somePage
.SomeAction()
.AnotherActionWithParams("somevalue")
//... other actions
.AnotherOneThatKeepsYouOnthePage();
// add some asserts using methods that give you info
// on LastPage (or that check the info is there).
// you can of course break the statements to add additional
// asserts on the multi-steps story.
It is important to understand that you go gradual about this. If it is an already built system, you add tests for features/changes you are working on. Adding more and more coverage along the way. Going manual instead, usually hides what you missed to test, so if you made a change that affects every single page and you will check a subset (as time doesn't allows), you know which ones you actually tested and QA can work from there (hopefully by adding even more tests).
This is a common thing that is said about unit testing in general. "I need to write twice as much code for testing?" The same principles apply here. The payoff is the ability to change your code and know that you aren't breaking anything.
Because you can repeat the SAME test over and over again.
If your application is even 50+ pages and you need to do frequent builds and test it against X number of major browsers it makes a lot of sense.
Imagine you have 50 pages, all with 10 links each, and some with multi-stage forms that require you to go through the forms, putting in about 100 different sets of information to verify that they work properly with all credit card numbers, all addresses in all countries, etc.
That's virtually impossible to test manually. It becomes so prone to human error that you can't guarantee the testing was done right, never mind what the testing proved about the thing being tested.
Moreover, if you follow a modern development model, with many developers all working on the same site in a disconnected, distributed fashion (some working on the site from their laptop while on a plane, for instance), then the human testers won't even be able to access it, much less have the patience to re-test every time a single developer tries something new.
On any decent size of website, tests HAVE to be automated.
The point is the same as for any kind of automated testing: writing the code may take more time than "just clicking around and visually verifying things work", maybe 10 or even 50 times more.
But any nontrivial application will have to be tested far more than 50 times eventually, and manual tests are an annoying chore that will likely be omitted or done shoddily under pressure, which results in bugs remaining undiscovered until just bfore (or after) important deadlines, which results in stressful all-night coding sessions or even outright monetary loss due to contract penalties.
Selenium (along with similar tools, like Watir) lets you run tests against the user interface of your Web app in ways that computers are good at: thousands of times overnight, or within seconds after every source checkin. (Note that there are plenty of other UI testing pieces that humans are much better at, such as noticing that some odd thing not directly related to the test is amiss.)
There are other ways to involve the whole stack of your app by looking at the generated HTML rather than launching a browser to render it, such as Webrat and Mechanize. Most of these don't have a way to interact with JavaScript-heavy UIs; Selenium has you somewhat covered here.
Selenium will record and re-run all of the manual clicking and typing you do to test your web application. Over and over.
Over time studies of myself have shown me that I tend to do fewer tests and start skipping some, or forgetting about them.
Selenium will instead take each test, run it, if it doesn't return what you expect it, it can let you know.
There is an upfront cost of time to record all these tests. I would recommend it like unit tests -- if you don't have it already, start using it with the most complex, touchy, or most updated parts of your code.
And if you save those tests as JUnit classes you can rerun them at your leisure, as part of your automated build, or in a poor man's load test using JMeter.
In a past job we used to unit test our web-app. If the web-app changes its look the tests don't need to be re-written. Record-and-replay type tests would all need to be re-done.
Why do you need Selenium? Because testers are human beings. They go home every day, can't always work weekends, take sickies, take public holidays, go on vacation every now and then, get bored doing repetitive tasks and can't always rely on them being around when you need them.
I'm not saying you should get rid of testers, but an automated UI testing tool complements system testers.
The point is the ability to automate what was before a manual and time consuming test. Yes, it takes time to write the tests, but once written, they can be run as often as the team wishes. Each time they are run, they are verifying that behavior of the web application is consistent. Selenium is not a perfect product, but it is very good at automating realistic user interaction with a browser.
If you do not like the Selenium approach, you can try HtmlUnit, I find it more useful and easy to integrate into existing unit tests.
For applications with rich web interfaces (like many GWT projects) Selenium/Windmill/WebDriver/etc is the way to create acceptance tests. In case of GWT/GXT, the final user interface code is in JavaScript so creating acceptance tests using normal junit test cases is basically out of question. With Selenium you can create test scenarios matching real user actions and expected results.
Based on my experience with Selenium it can reveal bugs in the application logic and user interface (in case your test cases are well written). Dealing with AJAX front ends requires some extra effort but it is still feasible.
I use it to test multi page forms as this takes the burden out of typing the same thing over and over again. And having the ability to check if certain elements are present is great. Again, using the form as an example your final selenium test could check if something like say "Thanks Mr. Rogers for ordering..." appears at the end of the ordering process.

What is the code-coverage percentage on your project?

What is the % code-coverage on your project? I'm curious as to reasons why.
Is the dev team happy with it? If not, what stands in the way from increasing it?
Stuart Halloway is one whose projects aim for 100% (or else the build breaks!). Is anyone at that level?
We are at a painful 25% but aspire to 80-90% for new code. We have legacy code that we have decided to leave alone as it evaporates (we are actively re-writing).
We run at 85% code coverage, but falling below it does not break the build. I think using code coverage as an important metric is a dangerous practice. Just because something is covered in a test does not mean the coverage is any good. We try to use it as guidance for the areas we are weakly covered, not as a hard fact.
80% is the exit criteria for the milestone. If we don't make it thrgouh the sprint (even though we do plan the time up front), we add it through the stabilization. We might take an exception for particular component or feature, but we open Pri 1 item for the next milestone.
During coding, code coverage is measured automatically on the daily build and the report is sent to the whole team. Anything that falls under 70% is yellow, under 50% is red. We don't fail the build currently, but we have a plan to add this in the next milestone.
Not sure what the dev happines has to do with unit testing. Devs are hired to build quality product and there should be a process to enforce minimum quality and way to measure it. If somebody is not happy about the process, they are free to suggest another way of validating their code, before it is integrated with the rest of the components.
Btw, we measure code coverage on automated scenario tests as well. Thus, we have three unmbers - unit, scenario and combined.
Our company goal is 80% statement coverage, including exception handling code. Personally, I like to be above 90% on all of the stuff I check in.
I often use code coverage under our automated test suite, but primarily to look for untested areas. We get about 70% coverage most of the time, and will never hit 100% for two reasons;
1) We typically automate new functionality after the release which is manually tested for it's first release and hence not included in coverage analysis. Automation is primarily for functional regression in our case and is the best place to execute and tweak code coverage.
2) Fault injection is required to get 100% coverage, as you need to get inside execption handlers. This is difficult and time consuming to automate. We don't currently do this and hence won't ever get 100%. Jame's Whittakers books on breaking software cover this subject well for anyone interested.
It is also worth remembering that code coverage does not equate to test coverage, as is regularly discussed in threads such as this and this over on SQAforums. Thus 100% code coverage can be a mis-leading metric.
A couple of years ago I measured Perl's test coverage. By the end of 250 test cases it reached 70% of the code and 33% of fully tested branches
0% sadly at our workplace yet.
Will aim to improve that but trying to tell the bosses that we need it, it isn't easy since they see testing != coding less money.
A project I did a couple of years ago achieved 100% line coverage but I had total control over it so I could enforce the target.
We've now got an objective to have 50% of new code covered, a figure that will rise in the near future, but no way to measure it. We will soon have tools in place to measure code coverage on every nightly run of the unit tests, so I'm convinced our position will improve.