Determining which tests cover a line of code

Determining which tests cover a line of code - unit-testing

Is there a way to determine the set of unit tests that will potentially execute a given line of code? In other words, can you automatically determine not just whether a given line is covered, but the actual set of tests that cover it?
Consider a big code base with, say, 50K unit tests. Clearly, it could take a LONG time to run them all--hours, if not days. Working in such a code base, you'd like to be able to execute some subset of all the unit tests, including only those that cover the line (or lines) that you just touched. Sure, you could find some manually and run those, but I'm looking for a way to do it faster, and more comprehensively.
If I'm thinking about this correctly, it should be possible. A tool could statically traverse all the code paths leading out of each unit test, and come up with a slice of the program reachable from that test. And you should then (theoretically) be able to compute the set of unit tests that include a given line in their slice, meaning that the line could be executed by that test ("could" rather than "will" because the actual code path will only be determined at run time based on the inputs or other conditions). A given line of code could have a massive number of tests that execute it (say, code in a shared library), whereas other lines might have few (or no) tests covering them.
So:
Is my reasoning sound on this idea? Could it theoretically be done, or is there something I'm leaving out?
Is there already a tool out there that can do this? Or, is this a common thing with a name I haven't run into? Pointers to tools in the java world, or to general research on the subject, would be appreciated.

JetBrains's dotCover also now has this feature for .NET code. It can be accessed from the dotCover menu with the option "Show covering tests" or by pressing Ctrl + Alt + K.

I'm pretty sure Clover will show you which tests validate each line of code. So you could manually execute the tests by looking at the coverage reports. They also have a new API which you might be able to use to write an IDE plugin that would let you execute the tests that cover a line of code.

The following presentation discusses how to compute the program slice executed by a unit test. It answers the question of, "can you determine the test coverage without executing the program?" and basically sketches the idea you described... with the additional bit of work to actually implement it.
You might note that computing a program slice isn't a computationally cheap task. I'd guess that computing a slice (a symbolic computation) is generally slower than executing a unit test, so I'm not sure that you'll save any time doing this. And a slice is a conservative approximation of the affected part of the program, so the answer you get back will include program parts that actually don't get executed.
You might be better off instead to run all those 50,000 unit tests once, and collect the coverage data for each one. In this case, when some code fragment is updated, it is possible to determine statically whether the code a particular test executes includes the code you changed or not, and thus you can identify tests that have to be executed again. You can skip executing the rest of the tests.
My company builds a family of test coverage tools. Our next release of these tools will have this kind of incremental regression testing capability.

This is a feature that the JMockit Coverage tool (for Java) provides, although it shows the tests that did cover a given line of production code in the last run, not the tests "that will potentially execute a given line of code".
Typically, however, you would have a Jenkins (or whatever) build of the project, where all tests are executed and an HTML coverage report is generated. Then it would just be a matter of examining the report to see which tests are currently covering a given line of code.
A sample coverage report showing the list of tests for each line of production code is available online.

Related

What is the difference between test cases coverage and just a console application coverage?

I am having a hard time running a code coverage since most of the tools (including visual studio one) requires unit tests.
I dont understand why do I need to create unit tests, why cant I basically run a code coverage with my own console exe application?
just click F5 and get the report. without putting an effort into creating unit tests or whatever.
thanks

In general, with good test coverage tool, coverage data is collected for whatever causes execution of the code. You should be able to exercise the code by hand, and get coverage data, and/or execute the code via a test case, and get coverage data.
There's a question of how you track the coverage information back to a test. Most coverage tools just collect coverage data; some other mechanism is needed to associate that coverage data with a set-of or specific test. [Note that none of this discussion cares if your tests pass. its up to you whether you want track coverage data for failed tests or not; this information is often useful when trying to find out why the test failed].
You can associate coverage data with tests by convention; Semantic Designs test coverage tools (from my company) do this. After setting up the coverage collection, you run any set of tests, by any means you desire. Each program execution produces a test coverage vector with date stamp. You can combine the coverage vectors for such a set into a single set (the UI helps you do this, or you can do it in batch script by invoking the UI as a command line tool), and then you associate the set of tests you ran with the combined coverage data. Some people associate individual tests with the coverage data collected by the individual test execution. (Doing this allows you later discover if one tests covers everything another test covers, implying you don't need the second one).
You can be more organized; you can configure your automated tests to each exercise the code, and automatically store the generated vector away in place unique to the test. This is just adding a bit of scripting to each automated test. Some specific test coverage tools come with unit test mechanisms that have a way to do it. (Our tools don't insist on a specific test framework, so they work with any framework).

How do I get SonarQube to analyse Code Smells in unit tests but not count them for coverage reports?

I have C++ project being analysed with the commercial SonarQube plugin.
My project has, in my mind, an artificially high code coverage percentage reported as both the "production" source code lines and Unit Test code lines are counted. It is quite difficult to write many unit test code lines that are not run as a part of the unit testing, so they give an immediate boost to the coverage reports.
Is it possible to have the Unit Test code still analysed for Code Smells but not have it count towards the test coverage metric?
I have tried setting the sonar.tests=./Tests parameter (where ./Tests is the directory with my test code. This seems to exclude the test code from all analysis, leaving smells undetected. I would rather check that the test code is of good quality than hope it is obeying the rules applied to the project.
I tried adding the sonar.test.inclusions=./Tests/* in combination with the above. However, I either got the file path syntax incorrect or setting this variable causes a complete omission of the Test code, so that it no longer appears under the 'Code' tab at all as well as being excluded.
The documentation on Narrowing the Focus of what is analysed is not all the clear on what the expected behaviour is, at least to me. Any help would be greatly appreciated as going through every permutation will be quite confusing.
Perhaps I should just accept the idea that with ~300 lines of "production" code and 900 lines of stubs, mocks and unit tests a value of 75% test coverage could mean running 0 lines of "production" code. I checked and currently, my very simple application is at about that ratio of test code to "production" code. I'd expect the ratio to move more towards 50:50 over time but it might not do.

One solution I found was to have two separate SonarQube projects for a single repository. The first you setup in the normal way, with the test code excluded via sonar.tests=./Tests. The second you make a -test repository where you exclude all your production code.
This adds some admin and setup but guarantees that coverage for the normal project is a percentage of only the production code and that you have SonarQube Analysis performed on all your test code (which can also have coverage tracked and would be expected to be very high).
I struggle to remember where I found this suggestion a long time ago. Possibly somewhere in the SonarQube Community Forum, which is worth a look if you are stuck on something.

Use Google Test to test file output?

I'm working on a project to create a simulator (for modeling biological systems) in C++. The simulator takes an input file of parameters and then generates an output file with hundreds of molecule counts at different time points from the simulation. I'm using Google Test for all of my unit testing. I also want to include some higher level tests where I supply an input file with various model parameters and then check that the output file matches some reference file. Someone recommended using bash-tap for these higher level tests, but I'd prefer to stick to Google Test if possible. Is it possible to use Google Test for the higher level tests that I've described here?

We write CAE software (simulators) and use Google Test. We face similar issues, so hopefully you'll find the answers practical.
You can write the higher-level tests, but you will often have to do more than just "EXPECT_EQ()" for checking pass/fail. For example, if you had to test the connectivity of two abitrary graphs, it can be difficult if the algorithms are allowed to vary the order of nodes. Or, if you are comparing a matrix, sometimes you can have cases where the matrix rows and columns can be switched with no problem. Perhaps round-off error is ok. Be prepared to deal with these types of problems as they will be much more of an issue with a full simulator than with a unit test.
A more practical issue is when your organization says "run all tests before you check in." Or, maybe they run every time you hit the build button. If that's the case, you need to differentiate these unit tests from the higher level tests. We use Google Test Runner in Visual Studio, and it expects to run everything where the filename is "*Test*". It is best to name the higher level tests something else to be clear.
We also had to turn our entire executable into a DLL so that it could have tests run on top of it. There are other approaches (like scripting) which could be used with Google Test, but we've found the executable-as-a-dll approach to work. Our "real" product executable is simply a main() function that calls app_main() in the dll.
And, one final tip when using the Runner: If your app gets the --gtest_list_tests argument, don't do a bunch of expensive setup:
// Don't run if we are just listing tests.
if (!::testing::GTEST_FLAG(list_tests))
{
// Do expensive setup stuff here.
}
int result = RUN_ALL_TESTS();
if (!::testing::GTEST_FLAG(list_tests))
{
// Do expensive shutdown stuff here.
}

OpenCover takes much longer to run than the nunit-console

I'm trying to add unit tests to this project: https://github.com/JimBobSquarePants/ImageProcessor
When running the unit tests, they take maybe 1 or 2 minutes to run (it's an image processing library, and I don't expect them to be insanely fast).
The problem is that when I run OpenCover over these tests, they take something like 20 minutes to run.
The gist of the current unit tests is that there are a bunch of test images, and each unit test (more like integration tests, actually) reads each image, and runs a bunch of effects on it.
I'm guessing that I'm doing something wrong, but what? Why does it takes so much more time on OpenCover than NUnit runner ?

OpenCover instruments the IL of your assemblies (for which it can find a PDB file - because that is where the file location information is kept) and then for each sequence point (think of places you can put a break point) and each conditional branch path will cause an action to register the visit (and increase the visit count).
For algorithmic code you will find running coverage on heavy integration tests will be a performance issue so make sure you only run coverage on tight integration tests or on unit tests e.g. in your case perhaps use small images (as previously suggested) that can test the correctness of your code.
You haven't described how you are running OpenCover (or which version - I'll assume latest) but make sure you have excluded the test assemblies and are only instrumenting the target assemblies.
Finally OpenCover uses a few queues and threads but if you throw a lot of data at it due to loops etc then it will need time to process the data so it works much better on machines with 4 or more cores. When you are running your tests have a look at the task manager and see what is happening.

This is speculation because I don't use OpenCover, but a coverage analysis tool is supposed to instrument all lines it passes through. Since you are doing image manipulation, each pixel will certainly trigger OpenCover to do some analysis on the matching code lines, and you have lots of pixels
Let's say OpenCover takes 0.01ms to instrument one line of code (again this is pure speculation), that you are working with 1280*1024 images and that each pixel needs 3 lines of code (cap red channel, xor green and blue, whatever), you get 1310720 * 0.01 * 3 = approximately 39 seconds. For one test.
I doubt you only have one test, so multiply this by the amount of tests; you may have an idea of why it is slow.
You should perhaps try testing your algorithms on a smaller scale: unless you are doing image wide operations (I don't see which ones?) you code don't need the whole image to work on. Alternatively use smaller images?
EDIT: I had a look at the test suite here and (one again, not knowing OpenCover itself) can say that the problem comes from all the data you are testing; evey single image is loaded and processed for the same tests, which is not how you want to be unit testing something.
Test loading each image type into the Image class for the lib, then test one rotation from an Image class, one resize operation, etc. Don't test everything everytime!
Since the tests are necessary, maybe you could explore the OpenCover options to exclude some data. Perhaps refining your coverage analysis by instrumenting only the outer shell of your algorithm would help. Have a look at filters to see what you could hide in order to make it run acceptably.
Alternatively you could run the code coverage only daily, preferently at night?

I know, very old issue, but I ran also in this issue.
Also with a image library (trimming bitmaps) I ran into very long running time for the unit tests.
It can be fixed, by setting the option '-threshold:' for OpenCover to (for example) 50.

How to deal with code coverage?

The other day we had a hard discussion between different developers and project leads, about code coverage tools and the use of the corresponding reports.
Do you use code coverage in your projects and if so, why not?
Is code coverage a fixed part of your builds or continous integration
or do you just use it from time to time?
How do you deal with the numbers derived from the reports?

We use code coverage to verify that we aren't missing big parts in our testing efforts. Once a milestone or so we run a full coverage report and spend a few days analyzing the results, adding test coverage for areas we missed.
We don't run it every build because I don't know that we would analyze it on a regular enough basis to justify that.
We analyze the reports for large blocks of unhit code. We've found this to be the most efficient use. In the past we would try to hit a particular code coverage target but after some point, the returns become very diminishing. Instead, it's better to use code coverage as a tool to make sure you didn't forget anything.

1) Yes we do use code coverage
2) Yes it is part of the CI build (why wouldn't it be?)
3) The important part - we don't look for 100% coverage. What we do look for is buggy/complex code, that's easy to find from your unit tests, and the Devs/Leads will know the delicate parts of the system. We make sure the coverage of such code areas is good, and increases with time, not decreases as people hack in more fixes without the requisite tests.

Code coverage tells you how big your "bug catching" net is, but it doesn't tell you how big the holes are in your net.
Use it as an indicator to gauge your testing efforts but not as an absolute metric.
It is possible to write code that will give you 100% coverage and does not test anything at all.

The way to look at Code Coverage is to see how much is NOT covered and find out why it is not covered. Code coverage simply tells us that the lines of code is being hit when the unit tests are running. It does not tell us that the code works correctly or not. 100% code coverage is a good number but in medium/large projects it is very hard to achieve.

I like to measure code coverage on any non-trivial project. As has been mentioned, try not to get too caught up in achieving an arbitrary/magical percentage. There are better metrics, such as riskiness based on complexity, coverage by package/namespace, etc.
Take a look at this sample Clover dashboard for similar ideas.

We do it in a build, and we see that it should not drop below some value, like 85%.
I also do automatic Top 10 Largest Not-covered methods, to know what to start covering.

Many teams switching to Agile/XP use code coverage as an indirect way of gauging the ROI of their test automation efforts.
I think of it as an experiment - there's an hypothesis that "if we start writing unit tests, our code coverage will improve" - and it makes sense to collect the corresponding observation automatically, via CI, report it in a graph etc.
You use the results to detect rough spots: if the trend toward more coverage levels off at some point, for instance, you might stop to ask what's going on. Perhaps the team has trouble writing tests that are relevant.

We use code coverage to assure that we have no major holes in our tests, and it's run nightly in our CI.
Since we also have a full set of selenium-web tests that run all the way through the stack we also do an additional coverage trick:
We set up the web-application with coverage running. Then we run the full automated test battery of selenium tests. Some of these are smoke tests only.
When the full suite of tests has been run, we can identify suspected dead code simply by looking at the coverage and inspecting code. This is really nice when working on large projects, because you can have big branches of dead code after some time.
We don't really have any fixed metrics on how often we do this, but it's all set up to run with a keypress or two.

We do use code coverage, it is integrated in our nightly build. There are several tools to analyze the coverage data, commonly they report
statement coverage
branch coverage
MC/DC coverage
We expect to reach + 90% statement and branch coverage. MC/DC coverage on the other hand gives broader sense for test team. For the uncovered code, we expect justification records by the way.

I find it depends on the code itself. I won't repeat Joel's statements from SO podcast #38, but the upshot is 'try to be pragmatic'.
Code coverage is great in core elements of the app.
I look at the code as a tree of dependency, if the leaves work (e.g. basic UI or code calling a unit tested DAL) and I've tested them when I've developed them, or updated them, there is a large chance they will work, and if there's a bug, then it won't be difficult to find or fix, so the time taken to mock up some tests will probably be time wasted. Yes there is an issue that updates to code they are dependent on may affect them, but again, it's a case by case thing, and unit tests for the code they are dependent on should cover it.
When it comes to the trunks or branch of the code, yes code coverage of functionality (as opposed to each function), is very important.
For example, I recently was on a team that built an app that required a bundle of calculations to calculate carbon emissions. I wrote a suite of tests that tested each and every calculation, and in doing so was happy to see that the dependency injection pattern was working fine.
Inevitably, due to a government act change, we had to add a parameter to the equations, and all 100+ tests broke.
I realised to update them, over and above testing for a typo (which I could test once), I was unit/regression testing mathematics, and ended up spending the time on building another area of the app instead.

1) Yes we do measure simple node coverage, beacause:
it is easy to do with our current project* (Rails web app)
it encourages our developers to write tests (some come from backgrounds where testing was ad-hoc)
2) Code coverage is part of our continuous integration process.
3) The numbers from the reports are used to:
enforce a minimum level of coverage (95% otherwise the build fails)
find sections of code which should be tested
There are parts of the system where testing is not all that helpful (usually where you need to make use of mock-objects to deal with external systems). But generally having good coverage makes it easier to maintain a project. One knows that fixes or new features do not break existing functionality.
*Details for setting up required coverage for Rails: Min Limit 95 Ahead

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js