Jenkins Cobertura (with gcov) - what do the coverage statistics mean? - c++

I'm currently writing unit tests for a Qt project. I wanted to use the statistics provided in Jenkins through the Cobertura plugin (underneath gcov is used to get the stats).
:~$ gcov -v
gcov 5.4.0 20160609
:~$ gcc -v
gcc version 5.4.0
However after I looked at the table (see below) I was really surprised to see the poor coverage especially of conditionals. For the first one (see Coverage Breakdown by File) I thought I was actually done, since the code has only three ifs(each with a single condition) and my tests covers all (checked this also through debugging just to make sure). So I am really confused what these numbers actually mean and how to interpret them in order to make my unit tests better.
I've even started thinking that some of the poor results might be due to the use of Qt since it's not exactly pure C++ and all "extras" (slots, signals, MOC files etc.) might be something that gcov can't handle properly.

Checking the annotated source listing with displayed red/green coverage markings should help.
The numbers inside the coloured bars are line counts, hence 47/108 means that 47 lines of code out of the 108 lines that are controlled by conditionals, have coverage.
For each conditional you need at least two unit tests: one for each branch.
If there are && or || in the conditions - or anywhere else (eg a logical expression) then each combination must be exercised to achieve 100%.
Also don't forget
a = (j == 0) ? c : d;
requires (at least) two tests !
Also, if using C++, see Why gcc 4.1 + gcov reports 100% branch coverage and newer (4.4, 4.6, 4.8) reports 50% for "p = new class;" line?

Related

Python Sub-Process Coverage

Situation:
I'm attempting to get coverage reports for a project that uses both C++ and Python. I'm using LCOV/GCOV for C++, and attempting to use Coverage.py for the python stuff. The only issue is, most of the python code that's being used is simply utility functions being called one function at a time. No initialization, no real life-cycle, or exit. So no real way to use the API to start/stop/save, or use the coverage command line to measure.
With this, I thought the easiest way to accomplish this would be using the sitecustomize.py method like outlined here. I have gotten that to work, and it measures all configured python code as expected. Now I'm looking at how to accomplish this with compiled python code (.pyc).
I can get it to work if I keep source(.py) and (.pyc) in the same directory when running, and then reporting. However, I'm looking for a way to RUN the files and generate the measurement data. Then at a later time point to the actual source files, and run the actual reports. Ideally I wouldn't need the source(.py) files at all, but I haven't found a way to accomplish this.
Objective:
In the end I want to be able to compile the python files(.pyc), install them on the target, and run coverage like stated above. It will generate coverage data files, then pull those files to my host machine which houses the source(.py) .. and do the actual coverage reporting.
Is this possible currently?
[Edit] Thanks to Ned's advice, I looked into the [paths] usage, and it worked exactly how I needed it to.

How to exclude test paths from cppcheck analysis?

I try to run a cppcheck analysis over my code, which has the following file structure:
/code/module_1/src/a.cpp
/code/module_1/src/b.cpp
/code/module_1/test/c.cpp
/code/module_2/src/d.cpp
/code/module_2/src/e.cpp
/code/module_3/test/f.cpp
I'd like to run an analysis excluding all test code. Is this possible with a command like "cppcheck -itest"? It doesn't work for me, although I think it should, according to the docs:
...Directory name is matched to all parts of the path.
I'm using version 1.69. I know I could mention all test directories separately (which does work, I checked), but the number of modules is too high to do this for many analyses reasonably.
Is this possible?
I installed Cppcheck to do some tests and it seems the -i implementation is a bit bonkers. However, I managed to achieve what you want.
Solution: use -itest\ instead of -itest (this was in Windows; maybe Linux needs -itest/)
Rationale: in my tests, -itest worked only if there was a .\test\ directory, in which case even .\a\test\a.cpp was excluded. With -itest\, however, such exclusion took place regardless of the presence of .\test\ directory.
This seems like a bug which the developers ought to weed out, but, in the meantime, you can succeed using the above workaround.
This is a late response to an old question, but perhaps this will help other latecomers like myself.
Disclaimer: This answer is for Windows.
It seems as if v1.79 has remedied the OP's issue. The following command line syntax has worked for me:
cppcheck -itest code
In this example, "-itest" weeds out any occurrence of the "test" directory, as originally (and correctly) assumed by the OP. In addition, the code folder is found next to the cppcheck.exe. This will be the root of the recursive source-code scan.
I'd use something like:
cppcheck /code/module_1/src /code/module_2/src /code/module_3/src

CPP unit setup for C++

In CPP unit we run unit test as part of build as part of post build setup. We will be running multiple tests as part of this. In case if any test case fails post build should not stop, it should go ahead and run all the test cases and should report summary how many test cases passed and failed. how can we achieve this.
Thanks!
His question is specific enough. You need a test runner. Encapsulate each test in its own behavior and class. The test project is contained separately from the tested code. Afterwards just configure your XMLOutputter. You can find an excellent example of how to do this in the linux website. http://www.yolinux.com/TUTORIALS/CppUnit.html
We use this way to compile our test projects for our main projects and observe if everything is ok. Now it all becomes the work of maintaining your test code.
Your question is too vague for a precise answer. Usually, a unit test engine return a code to tell it has failed (like a non zero return code in the shell on linux) or generate some output file with results. The calling system handle this. If you have written it (some home made scripts) you have to give the option to go on tests execution even if an error occurred. If you are using some tools like continuous integration server, then you have to go through the doc and find the option that allows you to go on when tests fails.
A workaround is to write a script that return a "OK" result even if the unit test fails, but there you lose some automatic verification ...
Be more specific if you want more clues.
my2c
I would just write your tests this way. Instead of using the CPPUNIT_ASSERT macros or whatever you would write them in regular C++ with some way of logging errors.
You could use a macro for this too of course. Something like:
LOGASSERT( some_expression )
could be defined to execute some_expression and to log the expression together with FILE and LINE if it fails, and you can also log exceptions of course, as well as ones that are not thrown, simply by writing them in your tests (with macros if you want to log the expression that caused them with FILE and LINE).
If you are writing macros I would advise you to limit the content of your macro to calling an inline function with extra parameters.

Is it possible to measure function coverage with gcov?

Currently we use gcov with our testing suite for Linux C++ application and it does a good job at measuring line coverage.
Can gcov produce function/method coverage report in addition to line coverage?
Looking at the parameters gcov accepts I do not think it is possible, but I may be missing something. Or, probably, is there any other tool that can produce function/method coverage report out of statistics generated by gcc?
Update: By function/method coverage I mean percentage of functions that get executed during tests.
I guess what you mean is the -f option, which will give you the percentage of lines covered per function. There is an interesting article about gcov at Dr. Dobb's which might be helpful. If "man gcov" doesn't show the -f flag, check if you have a reasobably recent version of the gcc suite.
Edit: to get the percentage of functions not executed you can simply parse through the function coverage output, as 0.00% coverage should be pretty much equivalent to not called. This small script prints the percentage of functions not executed:
#!/bin/bash
if test -z "$1"
then
echo "First argument must be function coverage file"
else
notExecuted=`cat $1 | grep "^0.00%" | wc -l`
executed=`cat $1 | grep -v "^0.00%" | wc -l`
percentage=$(echo "scale=2; $notExecuted / ($notExecuted + $executed) * 100" |bc)
echo $percentage
fi
We have started to use gcov and lcov together. The results from lcov do include the percentage of functions that are executed for the "module" you're looking at.
EDIT: The module can go from directories down to files.
I also want to add that if you are already using the GNU compiler tools, then gcov/lcov won't be too difficult for you to get running and the results it produces are very impressive.
The lcov utility is nice, and we use it. But I'm not sure if you need it for what you want.
We
Use ctags (wikipedia; sourceforge) to find all the functions declared in the relevant header files.
Run GCOV to get line coverage for every function in the binary.
Compare the list of functions from 1 & 2 to produce "Functions Called" / "Functions Available".
We call it "API coverage" since we apply step #1 only to public API headers. But you can do it on all headers or only a subset as you choose. I think the ratio we produce in this manner is the ratio you are looking for.

What do you need from a test harness?

I'm one of the people involved in the Test Anything Protocol (TAP) IETF group (if interested, feel free to join the mailing list). Many programming languages are starting to adopt TAP as their primary testing protocol and they want more from it than what we currently offer. As a result, we'd like to get feedback from people who have a background in xUnit, TestNG or any other testing framework/methodology.
Basically, aside from a simple pass/fail, what information do you need from a test harness? Just to give you some examples:
Filename and line number (if applicable)
Start and end time
Diagnostic output such as the difference between what you got and what you expected.
And so on ...
Most definitely all things from your list for each individual item:
Filename
Line number
Namespace/class/function name
Test coverage
Start time and end time
And/or total time (this would be more useful for me than the top two items)
Diagnostic output such as the
difference between what you got and
what you expected.
From the top of my head not much else but for the group of tests I would like to know
group name
total execution time
It must be very, very easy to write a test, and equally easy to run them. That, to me, is the single most important feature of a testing harness. If someone has to fire up a GUI or jump through a bunch of hoops to write a test, they won't use it.
An arbitrary set of tags - so I can mark a test as, for example "integration, UI, admin".
(you knew I was going to ask for this didn't you :-)
To what you said I'd add:
Method/function/class name
Coverage counting tool, with exceptions (Do not count these methods)
Result of N last runs available
Mandate that ways to easily parse test results must exist
Any sort of diagnostic output - especially on failure is critical. If a test fails, you don't want to always have to rerun the test under a debugger to see what happened - there should be some cludes in the output.
I also like to see a before and after snapshot of critical system variables like memory or hard disk space available as those can provide great clues as well.
Finally, if you're using random seeds for any of the tests, write the seed out to the logfile so that the test can be reproduced if necessary.
I'd like the ability to concatenate and nest TAP streams.
A unique id (uuid, md5sum) to be able to identify an individual test -- say, for use when inserting test results in a database, or identifying them in a bug tracker to make it possible for QA to rerun an individual test.
This would also make it possible to trace an individual test's behavior from build-to-build through the entire lifecycle of multiple revisions of a product. This could eventually allow larger-scale correlations between 'historic' events (new hire, product release, hardware upgrades) and the profile(s) of tests that fail as a result of such events.
I'm also thinking that TAP should be emitted through a dedicated side-channel rather than mixed in with stdout. I'm not sure this is under the scope of the protocol definition.
I use TAP as output protocol for a set of simple C++ test methods, and have seen the following shortcomings:
test steps cannot be put into groups (there's only the grouping into several test scripts; but for running all tests in our software, I need at least one more level of grouping, so that a single test step would be identified by like "DB connection" -> "Reconnection Test" -> "test step #3")
seeing differences between expected and actual output is useful; I either print the diff to stderr (as comment) or actually launch a graphical diff tool
the protocol and tools must be really language-independent. For example, so far I only know of the Perl "prove" tool for running tests, which is limited to running Perl scripts
In the end, the test output must be suitable as basis for easily generating an HTML report file which lists succeeded tests very concisely, gives detailed output for failed tests, and makes it possible to quickly jump into the IDE to the failing test line.
optional ascii coloured output, green for good, yellow for pending, red for errors
the idea of things being pending
a summary at the end of the test report of commands that will run the individual tests where
List item
something went wrong
something in the test was pending
Extension idea for TAP:
1..4
ok 1 - yay
not ok 2 - boo
ok 3 - yay #json:{...}
ok 4 - see my json
Ability to attach a #json comment...
- can be safely ignored by existing code
- well-defined tags can be easily reserved at testanything.org
- easy to produce, parse and read complex types
- yaml is a pain