Background
I am aware of the principles of TDD (Test Driven Development) and unit testing, as well of different coverage metrics. Currently, i am working on an Linux C/C++ project, where 100% branch coverage should be reached.
Question
Does anybody know a technique/method to automatically identify those unit test cases, that contribute most to reach a specific coverage goal? Each unit-test could then be associated with contribution rate (in percent). Having this numbers, unit-test cases could be ordered by their contribution-rate.
The Greedy algorithm can help here. In simple words:
From all tests select the one with the highest coverage
Calculate the coverage delta between remaining candidates and tests selected already.
Pick the candidate that gives the biggest delta
Repeat as of step 2 until all tests are put into the ranking
As a result you'll get a sorting that looks like the one generated by Squish Coco for GNU coreutils:
Typically the benefit of each extra test will go down the more tests you add. Some of them may be even have zero contribution to the total coverage.
A good case for this sorting is an optimal execution order for smoke tests that only have limited time to run. For a complete testing you better always run the whole suite, of course.
Related
I'm trying to add unit tests to this project: https://github.com/JimBobSquarePants/ImageProcessor
When running the unit tests, they take maybe 1 or 2 minutes to run (it's an image processing library, and I don't expect them to be insanely fast).
The problem is that when I run OpenCover over these tests, they take something like 20 minutes to run.
The gist of the current unit tests is that there are a bunch of test images, and each unit test (more like integration tests, actually) reads each image, and runs a bunch of effects on it.
I'm guessing that I'm doing something wrong, but what? Why does it takes so much more time on OpenCover than NUnit runner ?
OpenCover instruments the IL of your assemblies (for which it can find a PDB file - because that is where the file location information is kept) and then for each sequence point (think of places you can put a break point) and each conditional branch path will cause an action to register the visit (and increase the visit count).
For algorithmic code you will find running coverage on heavy integration tests will be a performance issue so make sure you only run coverage on tight integration tests or on unit tests e.g. in your case perhaps use small images (as previously suggested) that can test the correctness of your code.
You haven't described how you are running OpenCover (or which version - I'll assume latest) but make sure you have excluded the test assemblies and are only instrumenting the target assemblies.
Finally OpenCover uses a few queues and threads but if you throw a lot of data at it due to loops etc then it will need time to process the data so it works much better on machines with 4 or more cores. When you are running your tests have a look at the task manager and see what is happening.
This is speculation because I don't use OpenCover, but a coverage analysis tool is supposed to instrument all lines it passes through. Since you are doing image manipulation, each pixel will certainly trigger OpenCover to do some analysis on the matching code lines, and you have lots of pixels
Let's say OpenCover takes 0.01ms to instrument one line of code (again this is pure speculation), that you are working with 1280*1024 images and that each pixel needs 3 lines of code (cap red channel, xor green and blue, whatever), you get 1310720 * 0.01 * 3 = approximately 39 seconds. For one test.
I doubt you only have one test, so multiply this by the amount of tests; you may have an idea of why it is slow.
You should perhaps try testing your algorithms on a smaller scale: unless you are doing image wide operations (I don't see which ones?) you code don't need the whole image to work on. Alternatively use smaller images?
EDIT: I had a look at the test suite here and (one again, not knowing OpenCover itself) can say that the problem comes from all the data you are testing; evey single image is loaded and processed for the same tests, which is not how you want to be unit testing something.
Test loading each image type into the Image class for the lib, then test one rotation from an Image class, one resize operation, etc. Don't test everything everytime!
Since the tests are necessary, maybe you could explore the OpenCover options to exclude some data. Perhaps refining your coverage analysis by instrumenting only the outer shell of your algorithm would help. Have a look at filters to see what you could hide in order to make it run acceptably.
Alternatively you could run the code coverage only daily, preferently at night?
I know, very old issue, but I ran also in this issue.
Also with a image library (trimming bitmaps) I ran into very long running time for the unit tests.
It can be fixed, by setting the option '-threshold:' for OpenCover to (for example) 50.
Our company is trying to enforce test driven development and as a development manager, I'm trying to define what that acceptance criteria really means. We're mostly follow an agile methodology and each story going to test needs some level of assurance (entrance criteria) of unit test coverage. I'm interested to hear how you guys enforce this (if you do) from a gating level effectively within your companies.
What you don't want is to set any code coverage requirements. Any requirement like that can and will be gamed.
Instead, I'd look at measuring RTF: Running, Tested Features. See http://xprogramming.com/articles/jatrtsmetric/
For our Ruby on Rails app, we use a code metric gem called SimpleCov. I am not sure what language your company uses, but I am sure there is a code metric for it. SimpleCov is great for Ruby, because it provides an extensive GUI, highlighting down to the line whether code was covered, skipped (filtered out), or missed.
We just started to track our code coverage for two months now. We began at 30%, and are now near 60%. Depending on the age of your company's application, you may want to raise your coverage expectations to 80% or higher... According to SimpleCov, anything 91% or higher is "in the green", and below 80% is "in the red" (for great color analogies).
I feel that the most important thing is to make sure you have your crucial features tested -- such features may have the most lines of code to be tested. Getting those done first will drastically increase coverage.
Another thing to note, if you use a library like SimpleCov, you may be able to skip (filter out) lines of code, or even entire files, that you feel are legacy and may lower your coverage. That is another reason why our coverage almost doubled in 2 months.
Again, we are new to measuring code coverage, but strongly believe in its benefit to our current testing suite and application development.
Is there a sane way to unit test a stochastic process? For example say that you have coded a simulator for a specific system model. The simulator works randomly based on the seeds of the rngs so the state of the system cannot be predicted and if it can be every test should bring the system to a specific state before it attempts to test any method of a class. Is there a better way to do this?
The two obvious choices are to remove the randomness (that is, use a fixed, known seed for your unit tests and proceed from there), or to test statistically (that is, run the same test case a million times and verify that the mean and variance (etc.) match expectations). The latter is probably a better test of your system, but you'll have to live with some false alarms.
Here's a nice blog post that covers this topic. Basically you will need to inject a controlled randomness into the object under test.
Maybe you could use JUnit Theories to solve that.
http://blogs.oracle.com/jacobc/entry/junit_theories
you need to find the Q0 and p00 for the system. p00 is the predicted state while qo is
the calculated state.the predicted state can lead to find the recurrant system which is
the smallest value, say k in the system.
If your model is stochastic, then you could treat output as a randomly generated sample. Then, in your unit testing function, you could perform some sort of hypothesis testing with confidence interval. If the test output is within the confidence bound, then the test is successful. However, there will be a possibility of generating false positive/false negative.
The other day we had a hard discussion between different developers and project leads, about code coverage tools and the use of the corresponding reports.
Do you use code coverage in your projects and if so, why not?
Is code coverage a fixed part of your builds or continous integration
or do you just use it from time to time?
How do you deal with the numbers derived from the reports?
We use code coverage to verify that we aren't missing big parts in our testing efforts. Once a milestone or so we run a full coverage report and spend a few days analyzing the results, adding test coverage for areas we missed.
We don't run it every build because I don't know that we would analyze it on a regular enough basis to justify that.
We analyze the reports for large blocks of unhit code. We've found this to be the most efficient use. In the past we would try to hit a particular code coverage target but after some point, the returns become very diminishing. Instead, it's better to use code coverage as a tool to make sure you didn't forget anything.
1) Yes we do use code coverage
2) Yes it is part of the CI build (why wouldn't it be?)
3) The important part - we don't look for 100% coverage. What we do look for is buggy/complex code, that's easy to find from your unit tests, and the Devs/Leads will know the delicate parts of the system. We make sure the coverage of such code areas is good, and increases with time, not decreases as people hack in more fixes without the requisite tests.
Code coverage tells you how big your "bug catching" net is, but it doesn't tell you how big the holes are in your net.
Use it as an indicator to gauge your testing efforts but not as an absolute metric.
It is possible to write code that will give you 100% coverage and does not test anything at all.
The way to look at Code Coverage is to see how much is NOT covered and find out why it is not covered. Code coverage simply tells us that the lines of code is being hit when the unit tests are running. It does not tell us that the code works correctly or not. 100% code coverage is a good number but in medium/large projects it is very hard to achieve.
I like to measure code coverage on any non-trivial project. As has been mentioned, try not to get too caught up in achieving an arbitrary/magical percentage. There are better metrics, such as riskiness based on complexity, coverage by package/namespace, etc.
Take a look at this sample Clover dashboard for similar ideas.
We do it in a build, and we see that it should not drop below some value, like 85%.
I also do automatic Top 10 Largest Not-covered methods, to know what to start covering.
Many teams switching to Agile/XP use code coverage as an indirect way of gauging the ROI of their test automation efforts.
I think of it as an experiment - there's an hypothesis that "if we start writing unit tests, our code coverage will improve" - and it makes sense to collect the corresponding observation automatically, via CI, report it in a graph etc.
You use the results to detect rough spots: if the trend toward more coverage levels off at some point, for instance, you might stop to ask what's going on. Perhaps the team has trouble writing tests that are relevant.
We use code coverage to assure that we have no major holes in our tests, and it's run nightly in our CI.
Since we also have a full set of selenium-web tests that run all the way through the stack we also do an additional coverage trick:
We set up the web-application with coverage running. Then we run the full automated test battery of selenium tests. Some of these are smoke tests only.
When the full suite of tests has been run, we can identify suspected dead code simply by looking at the coverage and inspecting code. This is really nice when working on large projects, because you can have big branches of dead code after some time.
We don't really have any fixed metrics on how often we do this, but it's all set up to run with a keypress or two.
We do use code coverage, it is integrated in our nightly build. There are several tools to analyze the coverage data, commonly they report
statement coverage
branch coverage
MC/DC coverage
We expect to reach + 90% statement and branch coverage. MC/DC coverage on the other hand gives broader sense for test team. For the uncovered code, we expect justification records by the way.
I find it depends on the code itself. I won't repeat Joel's statements from SO podcast #38, but the upshot is 'try to be pragmatic'.
Code coverage is great in core elements of the app.
I look at the code as a tree of dependency, if the leaves work (e.g. basic UI or code calling a unit tested DAL) and I've tested them when I've developed them, or updated them, there is a large chance they will work, and if there's a bug, then it won't be difficult to find or fix, so the time taken to mock up some tests will probably be time wasted. Yes there is an issue that updates to code they are dependent on may affect them, but again, it's a case by case thing, and unit tests for the code they are dependent on should cover it.
When it comes to the trunks or branch of the code, yes code coverage of functionality (as opposed to each function), is very important.
For example, I recently was on a team that built an app that required a bundle of calculations to calculate carbon emissions. I wrote a suite of tests that tested each and every calculation, and in doing so was happy to see that the dependency injection pattern was working fine.
Inevitably, due to a government act change, we had to add a parameter to the equations, and all 100+ tests broke.
I realised to update them, over and above testing for a typo (which I could test once), I was unit/regression testing mathematics, and ended up spending the time on building another area of the app instead.
1) Yes we do measure simple node coverage, beacause:
it is easy to do with our current project* (Rails web app)
it encourages our developers to write tests (some come from backgrounds where testing was ad-hoc)
2) Code coverage is part of our continuous integration process.
3) The numbers from the reports are used to:
enforce a minimum level of coverage (95% otherwise the build fails)
find sections of code which should be tested
There are parts of the system where testing is not all that helpful (usually where you need to make use of mock-objects to deal with external systems). But generally having good coverage makes it easier to maintain a project. One knows that fixes or new features do not break existing functionality.
*Details for setting up required coverage for Rails: Min Limit 95 Ahead
What is the % code-coverage on your project? I'm curious as to reasons why.
Is the dev team happy with it? If not, what stands in the way from increasing it?
Stuart Halloway is one whose projects aim for 100% (or else the build breaks!). Is anyone at that level?
We are at a painful 25% but aspire to 80-90% for new code. We have legacy code that we have decided to leave alone as it evaporates (we are actively re-writing).
We run at 85% code coverage, but falling below it does not break the build. I think using code coverage as an important metric is a dangerous practice. Just because something is covered in a test does not mean the coverage is any good. We try to use it as guidance for the areas we are weakly covered, not as a hard fact.
80% is the exit criteria for the milestone. If we don't make it thrgouh the sprint (even though we do plan the time up front), we add it through the stabilization. We might take an exception for particular component or feature, but we open Pri 1 item for the next milestone.
During coding, code coverage is measured automatically on the daily build and the report is sent to the whole team. Anything that falls under 70% is yellow, under 50% is red. We don't fail the build currently, but we have a plan to add this in the next milestone.
Not sure what the dev happines has to do with unit testing. Devs are hired to build quality product and there should be a process to enforce minimum quality and way to measure it. If somebody is not happy about the process, they are free to suggest another way of validating their code, before it is integrated with the rest of the components.
Btw, we measure code coverage on automated scenario tests as well. Thus, we have three unmbers - unit, scenario and combined.
Our company goal is 80% statement coverage, including exception handling code. Personally, I like to be above 90% on all of the stuff I check in.
I often use code coverage under our automated test suite, but primarily to look for untested areas. We get about 70% coverage most of the time, and will never hit 100% for two reasons;
1) We typically automate new functionality after the release which is manually tested for it's first release and hence not included in coverage analysis. Automation is primarily for functional regression in our case and is the best place to execute and tweak code coverage.
2) Fault injection is required to get 100% coverage, as you need to get inside execption handlers. This is difficult and time consuming to automate. We don't currently do this and hence won't ever get 100%. Jame's Whittakers books on breaking software cover this subject well for anyone interested.
It is also worth remembering that code coverage does not equate to test coverage, as is regularly discussed in threads such as this and this over on SQAforums. Thus 100% code coverage can be a mis-leading metric.
A couple of years ago I measured Perl's test coverage. By the end of 250 test cases it reached 70% of the code and 33% of fully tested branches
0% sadly at our workplace yet.
Will aim to improve that but trying to tell the bosses that we need it, it isn't easy since they see testing != coding less money.
A project I did a couple of years ago achieved 100% line coverage but I had total control over it so I could enforce the target.
We've now got an objective to have 50% of new code covered, a figure that will rise in the near future, but no way to measure it. We will soon have tools in place to measure code coverage on every nightly run of the unit tests, so I'm convinced our position will improve.