Code coverage metrics when using groovy AST transforms

Code coverage metrics when using groovy AST transforms - unit-testing

We use several AST transforms in our groovy code, such as #ToString and #EqualsAndHashCode. We use these so we don't have to maintain and test them. The problem is that code coverage metrics (using jacoco right now but open to change if it will help) don't know these are autogenerated methods and they cause a lot of code to appear uncovered even though it's not actually code we're writing.
Is there a way to include these from coverage metrics in any tools?
I guess you could argue that since we're putting the annotations we should still be testing the code being generated since a unit test shouldn't care how these methods are created, but just that they work.

I had a similar issue with #Log and the conditionals that it inserts into the code. That gets reported (cobertura) as a lack of branch coverage.
But as you said: it just reports it correctly. The code is not covered.
If you don't need the code, you should not have generated it. If you need it and aim for full test coverage, you must test it or at least "exercise" it, i.e. somehow use it from your test cases even without asserts.
From a test methodology standpoint, not covering generated code is equally questionable as using exclusion patterns. From a pragmatic standpoint, you may just want to live with it.

Related

Use of test/resource files to compare unit test output = code smell?

Is it a code smell to use a generated file, like a spreadsheet or XML file, as a source for comparison in Unit tests?
Say that we had to write a lot of classes that produce various XML files for processing later. The unit tests would be a large number of repetitive assertions that foo.getExpectedValue() = expectedValue. Instead of doing this, the developer chooses to use the code they are supposed to be testing to generate the XML, then copy that into test/resources for all future tests load into memory as an object and to run their assertions against. Is this a code smell?

There are two practices in what you describe that classify as test smells.
First, you write that the classes that are to be tested are used to create the XML files that are later used to judge the correctness. This way you can find out if there are changes in the classes, but you can not figure out if the results were correct in the first place.
To avoid any misunderstandings: The smell is not that generated files are used, but that the files are generated with the code under test. The only way how such an approach might make sense would be if the results of the initial run were subject to thorough review. But, these reviews would have to be repeated again whenever the files are re-generated later.
Secondly, using complete XML files for comparison (generated or not) is another test small. The reason is, that these tests are not very focused. Any change to the XML generation will lead to a failing test. That may seem like a good thing, but it even applies to all kinds of intended changes, for example to changes in the indentation. Thus, you only have test that tell you "something changed", but not "something failed".
To have tests that tell you "something failed" you need more specific tests. For example, tests that only look at a certain portion of the generated XML. Some tests would look at the XML structure, others at the data content. You can even have tests to check the indentation. But, how? You could, for example, use regular expressions to see if some interesting portion of a generated XML string looks the way you expected.
Once you have more focused tests, then the rest results in case of intended modifications to your code will look different: When your changes are successfull, only a few test cases of many will fail, and this will be the ones that have tested against the part of the behaviour that you intentionally have changed. All other tests will still work OK and show you that your change did not break something unexpectedly. If in contrast your change was incorrect, then some more/other tests than the expected ones will show you that the change had unexpected effects.

Yes, I wouldn't do that. This is breaking principles of a good test. Mainly, a good test test should be;
Independent - should not rely on other tests. If subsequent tests have to wait for a file generated by the first test,tests are not independent.
Repeatable - these tests introduce the flakiness due to the file/in-memory dependency. Therefore, they may not be repeatable consistently.
Perhaps, you could take a step back and see whether you need to unit test every generated XML. If these file generation follow the same code execution path with (no logical difference), I wouldn't write unit test per each case. If these XML generation is business operation related, I would consider having acceptance tests.

If the files are small, you can use string reader and writer. In general one is supposed to avoid doing i/o in unit tests. However,I see nothing wrong with using files in some non-unit tests.

Code Coverage: Best practice on excluding non-testable code

We are currently discussing how to define our goals on code coverage in a C# project (this question is not limited to C# though). On the way, we found that we should exclude some code from being counted towards the code coverage. The most obvious are the tests themselves, as they have 100% coverage and should not influence the average. But there are also classes that are wrappers for system calls we need to be able to create mocks. They are untestable, as we don't want to test system libraries. The code is calculated towards the coverage though and makes it hard to move across the 90% mark.
We do not want to lie at ourselves by excluding every piece of code that is untested, which makes it fairly easy to walk towards 100%.
Is there any reference, article or discussion on this topic with experience in this area? We would like to explore the different views on this topic that may help us finding and developing our definition for "testable code".

Code coverage isn't a terribly useful metric. It's helpful in giving management and onboarding developers a general idea of how much you value testing, but concerning yourself about it any more is pedantic and might actually serve to distract you from writing meaningful tests.
This is my opinion, but it is not an isolated one
Don't worry about coverage, define a policy in terms of what your goals are, be clear and concise, make sure everyone is on the same page about the importance of unit testing, and have a code review process in place.
You don't need a tool, just some human intelligence.

Code coverage is not a metric for how complete your test suite is, but (if all all) how incomplete it is. A high test coverage might be totally meaningless, depending on how your code coverage tool measures it.
To get a better picture which parts of your code need more tests, I would separate the code with the means of the programming language. Any sane code coverage tool should be able to show the result for these separately. Depending on how your application is structured, you can put the untestable code (i.e. the test suite itself, adapter code like "wrappers for system calls", controllers, UI elements, etc.) in separate namespaces or even separate assemblies.
The Java folks can replace "namespace" with "package" and "assembly" with "jar" in the paragraph above.

How can I verify that refactoring preserves code flow, not just behavior?

Sometimes, I see if-statements that could be written in a better way. Usually these are cases where we have several layers of nested if-statements and I've identified a simpler way of rewriting the block of if-statements.
Of course the biggest concern is that the resulting code will have a different code flow in certain cases.
How can I compare the two code-blocks and determine if the code flow is the same or different?
Is there a way to support this analysis with static analysis tools? Are there any other techniques that might help?

Find some way to exercise all possible paths through the code that you want to refactor. You could
write unit tests by hand
use Daikon http://plse.cs.washington.edu/daikon/, which exercises code automatically and systematically to infer invariants (I haven't used it myself, but I have tried a commercial descendant targeted at Java)
Either way, use a code coverage tool to verify that you have complete statement and decision coverage. Use a coverage tool that reports the number of times each statement is executed during the coverage run. You might even be able to get trucov, which actually generates diagrams of code paths, to work.
Do your refactoring.
Run the coverage tool again and compare statement execution counts before and after the refactoring. If any statement execution count changed, the flow must have changed. The opposite isn't guaranteed to be true, but it's probably close enough to true for practical applications. Alternatively, if you got trucov to work, compare execution graphs before and after; that would be definitive.

Are there any tools which specifically measure useful code coverage?

It is possible, albeit counterproductive, to write a unit test which executes some code, and asserts truth. This is a deliberately extreme, and simplified example - I'm sure most people have come across tests which execute code without actually making use of it.
Are there any code coverage tools which assess whether code covered is actually used as part of the assertions of the test?

What you want in essence is to compute the intersection between the covered code, and a backward code slice (minus the unit test itself) on the assertion in the covered test.
If that intersection is empty, the assertion doesn't test any part of the application.
Most code coverage tools don't compute slices, so I'd guess the answer to your question is "no".

You would need some kind of dependency analysis, i.e. to find out which statements the assertion depends on. Then this would have to be cross-checked with the coverage. CodeSurfer is one commercial tool that does dependency analysis for C.

Once again, I'd just investigate why this is happening ? Tests added just for the sake of increasing coverage numbers often are a symptom of a more serious cause. Educating everyone instead of picking out specific offenders usually works out better.
For identifying mistakes that have already been made: You can use a static code analysis tool that checks that
each test has atleast ONE assert. Of course this can also be defeated by rogue asserts.
there are no unused return values.
I saw the first one in something called TestLint from RoyOsherove and the second one I can think of Parasoft or similar.
Still this isn't a foolproof method and will sink significant time poring through the SCA reported issues. But that's the best I can think of.

Unit Testing Vocabulary: "coverage"

I'm preparing some educational/training material with respect to Unit Testing, and want to double check some vocabulary.
In an example I'm using the developer has tested a Facade for all possible inputs but hasn't tested the more granular units 'behind' it.
Would one still say that the tests have "full coverage" - given they cover the entire range of inputs? I feel like "full coverage" is generally used to denote coverage of code/units... but there would certainly be full something when testing all possible inputs.
What's the other word I'm looking for?

If all possible inputs don't give you 100% code coverage, you have 100% scenario coverage, but not complete code coverage.
On that note, if you have 100% scenario coverage without full code coverage, you have dead code and you should think real hard about why it exists.

If you decide to use 'full coverage' then you might have troubles because most literature that speaks about coverage (and indeed, the tools that measure coverage as well) talk about the lines of code that are executed in the code under test after all tests are run.
The test cases that you propose would be said to cover the domain of the function (and assuming a function that is at least 1-to-1, they will cover the range as well).

it is full code coverage of the involved classes, but clearly not of the full system source. Tools can give this at different level.
note that it doesn't guarantee the code is correct, as it might have missed an scenario that needs to be handled altogether (both in test and feature code). Additionally the tests could trigger all code paths and not be asserting correctly.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js