How to easily find out which tests fail - unit-testing

I test my code with go test ./... -v -short.
Unfortunately, -v only prints out each test as it happens, but does not leave a summary of the results at the bottom like in Java. This means that if any test failed somewhere at the top, I have to scroll up and look for the word FAIL or search for it in a text editor.
The -failfast flag isn't helping either because some of my tests still get printed after the first test failure for some reason.
I don't really care if tests get run after the initial test failure. I just want to be able to easily tell if any test failed, preferably in just one place (e.g. a summary of how many tests passed or failed, or by seeing a flag if all tests passed or not).
Is there a way to easily tell if there was a test failure because I don't want to accidentally continue coding if I still have test failures.
I'm on Windows 10 64-bit.
UPDATE: Many thanks to #icza for the findstr tip. I later realized that I also wanted to see the error descriptions along with the test failures, but did not want to run go test twice. This is what I came up with for CMD (does not work on Powershell):
go test ./... -v -short > test-results.txt & findstr "FAIL _test" test-results.txt
Now findstr should report test failures as well as error descriptions. And if you want to see the full test results, simply open test-results.txt.

Failing tests are indicated with FAIL in the output. So all you have to do is filter the output for that word.
On Unix systems:
go test ./... |grep FAIL
On Windows:
go test ./... |findstr FAIL
Note that this is purely text processing, it doesn't know anything about go tests and their results. This means you might get "false positives" if a test outputs the word FAIL even if it succeeds. But in practice, this pretty much does the job you want.
A more sophisticated and more accurate way to achieve this would be to pass -json flag to go test, so it generates JSON output, which you can process with a program (e.g. written in Go itself). Failing tests are indicated with a JSON object having an "Action":"fail" field, e.g.
{"Time":"2019-03-01T12:06:21.108544405+01:00","Action":"fail",
"Package":"some/package","Test":"TestSomething","Elapsed":0.01}
And even if you don't want to write a program for this, filtering the JSON output leaves less chance for false positive (filtering for "Action":"fail"):
Unix:
go test ./... -json |grep '"Action":"fail"'
Windows:
go test ./... -json |findstr /C:"\"Action\":\"fail\""

I found it painless to install gotestsum and get the neat summary at the end.
go install gotest.tools/gotestsum#latest
gotestsum --format testname # Or dots
An alternative, if you only care about the count is:
go test |grep FAIL |wc -l

Related

(Google Test) Automatically retry a test if it failed the first time

Our team uses Google Test for automated testing. Most of our tests pass consistently, but a few seem to fail ~5% of the time due to race conditions, network time-outs, etc.
We would like the ability to mark certain tests as "flaky". A flaky test would be automatically re-run if it fails the first time, and will only fail the test suite if it fails both times.
Is this something Google Test offers out-of-the-box? If not, is it something that can be built on top of Google Test?
You have several options:
Use --gtest_repeat for the test executable:
The --gtest_repeat flag allows you to repeat all (or selected) test methods in a program many times. Hopefully, a flaky test will eventually fail and give you a chance to debug.
You can mimic tagging your tests by adding "flaky" somewhere in their names, and then use the gtest_filter option to repeat them. Below are some examples from Google documentation:
$ foo_test --gtest_repeat=1000
Repeat foo_test 1000 times and don't stop at failures.
$ foo_test --gtest_repeat=-1
A negative count means repeating forever.
$ foo_test --gtest_repeat=1000 --gtest_break_on_failure
Repeat foo_test 1000 times, stopping at the first failure. This
is especially useful when running under a debugger: when the test
fails, it will drop into the debugger and you can then inspect
variables and stacks.
$ foo_test --gtest_repeat=1000 --gtest_filter=Flaky.*
Repeat the tests whose name matches the filter 1000 times.
See here for more info.
Use bazel to build and run your tests:
Rather than tagging your tests in the test files, you can tag them in the bazel BUILD files.
You can tag each test individually using cc_test rule.
You can also define a set of tests (using test_suite) in the BUILD file and tag them together (e.g. "small", "large", "flaky", etc). See here for an example.
Once you tag your tests, you can use simple commands like this:
% bazel test --test_tag_filters=performance,stress,-flaky //myproject:all
The above command will test all tests in myproject that are tagged as performance,stress, and are not flaky.
See here for documentation.
Using Bazel is probably cleaner because you don't have to modify your test files, and you can quickly modify your tests tags if things change.
See this repo and this video for examples of running tests using bazel.

How to keep the unit test output in Jenkins

We have managed to have Jenkins correctly parse our XML output from our tests and also included the error information, when there is one. So that it is possible to see, directly in the TestCase in Jenkins the error that occurred.
What we would like to do is to have Jenkins keep a log output, which is basically the console output, associated with each case. This would enable anyone to see the actual console output of each test case, failed or not.
I haven't seen a way to do this.
* EDIT *
Clarification - I want to be able to see the actual test output directly in the Jenkins interface, the same way it does when there is an error, but for the whole output. I don't want only Jenkins to keep the file as artifact.
* END OF EDIT *
Anyone can help us on this?
In the Publish JUnit test result report (Post-build Actions) tick the Retain long standard output/error checkbox.
If checked, any standard output or error from a test suite will be
retained in the test results after the build completes. (This refers
only to additional messages printed to console, not to a failure stack
trace.) Such output is always kept if the test failed, but by default
lengthy output from passing tests is truncated to save space. Check
this option if you need to see every log message from even passing
tests, but beware that Jenkins's memory consumption can substantially
increase as a result, even if you never look at the test results!
This is simple to do - just ensure that the output file is included in the list of artifacts for that job and it will be archived according to the configuration for that job.
Not sure if you have solve it yet, but I just did something similar using Android and Jenkins.
What I did was using the http://code.google.com/p/the-missing-android-xml-junit-test-runner/ to run the tests in the Android emulator. This will create the necessary JUnit formatted XML files, on the emulator file system.
Afterwards, simply use 'adb pull' to copy the file over, and configure the Jenkin to parse the results. You can also artifact the XML files if necessary.
If you simply want to display the content of the result in the log, you can use 'Execute Shell' command to print it out to the console, where it will be captured in the log file.
Since Jenkins 1.386 there was a change mentioned to Retain long standard output/error in each build configuration. So you just have to check the checkbox in the post-build actions.
http://hudson-ci.org/changelog.html#v1.386
When using a declarative pipeline, you can do it like so:
junit testResults: '**/build/test-results/*/*.xml', keepLongStdio: true
See the documentation:
If checked, the default behavior of failing a build on missing test result files or empty test results is changed to not affect the status of the build. Please note that this setting make it harder to spot misconfigured jobs or build failures where the test tool does not exit with an error code when not producing test report files.

CPP unit setup for C++

In CPP unit we run unit test as part of build as part of post build setup. We will be running multiple tests as part of this. In case if any test case fails post build should not stop, it should go ahead and run all the test cases and should report summary how many test cases passed and failed. how can we achieve this.
Thanks!
His question is specific enough. You need a test runner. Encapsulate each test in its own behavior and class. The test project is contained separately from the tested code. Afterwards just configure your XMLOutputter. You can find an excellent example of how to do this in the linux website. http://www.yolinux.com/TUTORIALS/CppUnit.html
We use this way to compile our test projects for our main projects and observe if everything is ok. Now it all becomes the work of maintaining your test code.
Your question is too vague for a precise answer. Usually, a unit test engine return a code to tell it has failed (like a non zero return code in the shell on linux) or generate some output file with results. The calling system handle this. If you have written it (some home made scripts) you have to give the option to go on tests execution even if an error occurred. If you are using some tools like continuous integration server, then you have to go through the doc and find the option that allows you to go on when tests fails.
A workaround is to write a script that return a "OK" result even if the unit test fails, but there you lose some automatic verification ...
Be more specific if you want more clues.
my2c
I would just write your tests this way. Instead of using the CPPUNIT_ASSERT macros or whatever you would write them in regular C++ with some way of logging errors.
You could use a macro for this too of course. Something like:
LOGASSERT( some_expression )
could be defined to execute some_expression and to log the expression together with FILE and LINE if it fails, and you can also log exceptions of course, as well as ones that are not thrown, simply by writing them in your tests (with macros if you want to log the expression that caused them with FILE and LINE).
If you are writing macros I would advise you to limit the content of your macro to calling an inline function with extra parameters.

Is it possible to measure function coverage with gcov?

Currently we use gcov with our testing suite for Linux C++ application and it does a good job at measuring line coverage.
Can gcov produce function/method coverage report in addition to line coverage?
Looking at the parameters gcov accepts I do not think it is possible, but I may be missing something. Or, probably, is there any other tool that can produce function/method coverage report out of statistics generated by gcc?
Update: By function/method coverage I mean percentage of functions that get executed during tests.
I guess what you mean is the -f option, which will give you the percentage of lines covered per function. There is an interesting article about gcov at Dr. Dobb's which might be helpful. If "man gcov" doesn't show the -f flag, check if you have a reasobably recent version of the gcc suite.
Edit: to get the percentage of functions not executed you can simply parse through the function coverage output, as 0.00% coverage should be pretty much equivalent to not called. This small script prints the percentage of functions not executed:
#!/bin/bash
if test -z "$1"
then
echo "First argument must be function coverage file"
else
notExecuted=`cat $1 | grep "^0.00%" | wc -l`
executed=`cat $1 | grep -v "^0.00%" | wc -l`
percentage=$(echo "scale=2; $notExecuted / ($notExecuted + $executed) * 100" |bc)
echo $percentage
fi
We have started to use gcov and lcov together. The results from lcov do include the percentage of functions that are executed for the "module" you're looking at.
EDIT: The module can go from directories down to files.
I also want to add that if you are already using the GNU compiler tools, then gcov/lcov won't be too difficult for you to get running and the results it produces are very impressive.
The lcov utility is nice, and we use it. But I'm not sure if you need it for what you want.
We
Use ctags (wikipedia; sourceforge) to find all the functions declared in the relevant header files.
Run GCOV to get line coverage for every function in the binary.
Compare the list of functions from 1 & 2 to produce "Functions Called" / "Functions Available".
We call it "API coverage" since we apply step #1 only to public API headers. But you can do it on all headers or only a subset as you choose. I think the ratio we produce in this manner is the ratio you are looking for.

What do you need from a test harness?

I'm one of the people involved in the Test Anything Protocol (TAP) IETF group (if interested, feel free to join the mailing list). Many programming languages are starting to adopt TAP as their primary testing protocol and they want more from it than what we currently offer. As a result, we'd like to get feedback from people who have a background in xUnit, TestNG or any other testing framework/methodology.
Basically, aside from a simple pass/fail, what information do you need from a test harness? Just to give you some examples:
Filename and line number (if applicable)
Start and end time
Diagnostic output such as the difference between what you got and what you expected.
And so on ...
Most definitely all things from your list for each individual item:
Filename
Line number
Namespace/class/function name
Test coverage
Start time and end time
And/or total time (this would be more useful for me than the top two items)
Diagnostic output such as the
difference between what you got and
what you expected.
From the top of my head not much else but for the group of tests I would like to know
group name
total execution time
It must be very, very easy to write a test, and equally easy to run them. That, to me, is the single most important feature of a testing harness. If someone has to fire up a GUI or jump through a bunch of hoops to write a test, they won't use it.
An arbitrary set of tags - so I can mark a test as, for example "integration, UI, admin".
(you knew I was going to ask for this didn't you :-)
To what you said I'd add:
Method/function/class name
Coverage counting tool, with exceptions (Do not count these methods)
Result of N last runs available
Mandate that ways to easily parse test results must exist
Any sort of diagnostic output - especially on failure is critical. If a test fails, you don't want to always have to rerun the test under a debugger to see what happened - there should be some cludes in the output.
I also like to see a before and after snapshot of critical system variables like memory or hard disk space available as those can provide great clues as well.
Finally, if you're using random seeds for any of the tests, write the seed out to the logfile so that the test can be reproduced if necessary.
I'd like the ability to concatenate and nest TAP streams.
A unique id (uuid, md5sum) to be able to identify an individual test -- say, for use when inserting test results in a database, or identifying them in a bug tracker to make it possible for QA to rerun an individual test.
This would also make it possible to trace an individual test's behavior from build-to-build through the entire lifecycle of multiple revisions of a product. This could eventually allow larger-scale correlations between 'historic' events (new hire, product release, hardware upgrades) and the profile(s) of tests that fail as a result of such events.
I'm also thinking that TAP should be emitted through a dedicated side-channel rather than mixed in with stdout. I'm not sure this is under the scope of the protocol definition.
I use TAP as output protocol for a set of simple C++ test methods, and have seen the following shortcomings:
test steps cannot be put into groups (there's only the grouping into several test scripts; but for running all tests in our software, I need at least one more level of grouping, so that a single test step would be identified by like "DB connection" -> "Reconnection Test" -> "test step #3")
seeing differences between expected and actual output is useful; I either print the diff to stderr (as comment) or actually launch a graphical diff tool
the protocol and tools must be really language-independent. For example, so far I only know of the Perl "prove" tool for running tests, which is limited to running Perl scripts
In the end, the test output must be suitable as basis for easily generating an HTML report file which lists succeeded tests very concisely, gives detailed output for failed tests, and makes it possible to quickly jump into the IDE to the failing test line.
optional ascii coloured output, green for good, yellow for pending, red for errors
the idea of things being pending
a summary at the end of the test report of commands that will run the individual tests where
List item
something went wrong
something in the test was pending
Extension idea for TAP:
1..4
ok 1 - yay
not ok 2 - boo
ok 3 - yay #json:{...}
ok 4 - see my json
Ability to attach a #json comment...
- can be safely ignored by existing code
- well-defined tags can be easily reserved at testanything.org
- easy to produce, parse and read complex types
- yaml is a pain