Randomize unit-tests using cmake/ctest

Randomize unit-tests using cmake/ctest - unit-testing

I manage a quite large open-source project with many unit-tests (~200 files) and passing all tests is quite time consuming for the continuous integration. We use cmake/ctest/Catch2 for the unit-test framework.
Is there a way to tell cmake/ctest to only build and run a random subset of the unit tests (e.g. just 30%) ?
When iterating with several commits on the code for a given feature, the probability that all tests where checked tends to one, but each individual commit would be way faster.
Obviously, this ratio would be set to 100% when preparing a PR or a release.

ALTETRNATIVE
Finally, I came up with a cmake solution just by creating a new add_test() function that activate a test upon a random test:
function(my_add_test test_file) #optional_avoid_add_test
string(RANDOM LENGTH 2 ALPHABET "0123456789" _random)
if (${_random} LESS ${THRESHOLD_RANDOM_TESTING})
add_executable(${test_file} ${test_file}.cpp)
add_test(${test_file} ${test_file})
endif()
endfunction()
In my main cmake I have the global variable (that can be set at cmake CLI/GUI)
SET(THRESHOLD_RANDOM_TESTING "100" CACHE INTERNAL "~% of unit tests to build and run.")
Each time I regenerate the project, a new random selection is constructed.

Is there a way to tell cmake/ctest
Yes. Get all test, shuffle, get 30% of them and pass them to back to ctest.
Looks like fun, in linux shell would be just:
ctest -N | sed -n 's/ Test #[0-9]\+: //p' | { tmp=$(cat); cnt=$(wc -l <<<"$tmp"); shuf -n "$((cnt * 30 / 100))" <<<"$tmp"; }
I meant an automatic way to do that.
No, there is not.

Related

(Google Test) Automatically retry a test if it failed the first time

Our team uses Google Test for automated testing. Most of our tests pass consistently, but a few seem to fail ~5% of the time due to race conditions, network time-outs, etc.
We would like the ability to mark certain tests as "flaky". A flaky test would be automatically re-run if it fails the first time, and will only fail the test suite if it fails both times.
Is this something Google Test offers out-of-the-box? If not, is it something that can be built on top of Google Test?

You have several options:
Use --gtest_repeat for the test executable:
The --gtest_repeat flag allows you to repeat all (or selected) test methods in a program many times. Hopefully, a flaky test will eventually fail and give you a chance to debug.
You can mimic tagging your tests by adding "flaky" somewhere in their names, and then use the gtest_filter option to repeat them. Below are some examples from Google documentation:
$ foo_test --gtest_repeat=1000
Repeat foo_test 1000 times and don't stop at failures.
$ foo_test --gtest_repeat=-1
A negative count means repeating forever.
$ foo_test --gtest_repeat=1000 --gtest_break_on_failure
Repeat foo_test 1000 times, stopping at the first failure. This
is especially useful when running under a debugger: when the test
fails, it will drop into the debugger and you can then inspect
variables and stacks.
$ foo_test --gtest_repeat=1000 --gtest_filter=Flaky.*
Repeat the tests whose name matches the filter 1000 times.
See here for more info.
Use bazel to build and run your tests:
Rather than tagging your tests in the test files, you can tag them in the bazel BUILD files.
You can tag each test individually using cc_test rule.
You can also define a set of tests (using test_suite) in the BUILD file and tag them together (e.g. "small", "large", "flaky", etc). See here for an example.
Once you tag your tests, you can use simple commands like this:
% bazel test --test_tag_filters=performance,stress,-flaky //myproject:all
The above command will test all tests in myproject that are tagged as performance,stress, and are not flaky.
See here for documentation.
Using Bazel is probably cleaner because you don't have to modify your test files, and you can quickly modify your tests tags if things change.
See this repo and this video for examples of running tests using bazel.

How to easily find out which tests fail

I test my code with go test ./... -v -short.
Unfortunately, -v only prints out each test as it happens, but does not leave a summary of the results at the bottom like in Java. This means that if any test failed somewhere at the top, I have to scroll up and look for the word FAIL or search for it in a text editor.
The -failfast flag isn't helping either because some of my tests still get printed after the first test failure for some reason.
I don't really care if tests get run after the initial test failure. I just want to be able to easily tell if any test failed, preferably in just one place (e.g. a summary of how many tests passed or failed, or by seeing a flag if all tests passed or not).
Is there a way to easily tell if there was a test failure because I don't want to accidentally continue coding if I still have test failures.
I'm on Windows 10 64-bit.
UPDATE: Many thanks to #icza for the findstr tip. I later realized that I also wanted to see the error descriptions along with the test failures, but did not want to run go test twice. This is what I came up with for CMD (does not work on Powershell):
go test ./... -v -short > test-results.txt & findstr "FAIL _test" test-results.txt
Now findstr should report test failures as well as error descriptions. And if you want to see the full test results, simply open test-results.txt.

Failing tests are indicated with FAIL in the output. So all you have to do is filter the output for that word.
On Unix systems:
go test ./... |grep FAIL
On Windows:
go test ./... |findstr FAIL
Note that this is purely text processing, it doesn't know anything about go tests and their results. This means you might get "false positives" if a test outputs the word FAIL even if it succeeds. But in practice, this pretty much does the job you want.
A more sophisticated and more accurate way to achieve this would be to pass -json flag to go test, so it generates JSON output, which you can process with a program (e.g. written in Go itself). Failing tests are indicated with a JSON object having an "Action":"fail" field, e.g.
{"Time":"2019-03-01T12:06:21.108544405+01:00","Action":"fail",
"Package":"some/package","Test":"TestSomething","Elapsed":0.01}
And even if you don't want to write a program for this, filtering the JSON output leaves less chance for false positive (filtering for "Action":"fail"):
Unix:
go test ./... -json |grep '"Action":"fail"'
Windows:
go test ./... -json |findstr /C:"\"Action\":\"fail\""

I found it painless to install gotestsum and get the neat summary at the end.
go install gotest.tools/gotestsum#latest
gotestsum --format testname # Or dots
An alternative, if you only care about the count is:
go test |grep FAIL |wc -l

Suggestions for unit testing

Hello another question concerning debugging : Automatically generating test cases when i know the parameterset. And doing it all at once, instead during development (could kick myself)
i have a set of parameters for my software that i wish to test. (~ 12 parameters only). However of course these parameters are often integers, so for every parameter i can have 4 values that make sense(0, insanely huge, normally big, normally small).
is there a way i can generate my testcases automatically? would save me a lot of time. I already have to inspect every test case by hand, do i not? Alot of my program produces output to the console so normal assertions probably wont work, also i work on home made datastructures most of the time, so i could not use a simple assertion.
My dream option would be kind of a reverse regular expression, where i set the rules and get myself some file generated that i can use as an input (my software has a crude scripting language). that way i can assemble all input files and test them one by one.
Looking forward to listening to your kind suggestions.
cheers

There are lots of ways to generate test cases in your scenario -- though you're a bit vague on what form the inputs for your programs and units need to take. For one of my Fortran programs I use a template input parameter file, a bash script and a make file. The make file, when called on the test phony target:
a) compiles the program;
b) runs the bash script, which uses sed to replace placeholders in the template parameter file, to create 128 (or whatever) test input files;
c) submits all the test jobs to the job management system on our cluster.
Once they jobs have finished I have some other scripts to compare outputs with benchmarks, collect statistics, that sort of thing.
If you need more specific advice, post more specific questions.
EDIT: Using sed inside a bash script:
Suppose that the parameter input template file contains 3 codes to be replaced: $FREQ$, $NUM$ and $TOL$. Then I write a bash script with a 3-deep loop nest something like this:
for frq in 0.01 0.0 1 10
do
for np in 1 2 4 8 16
do
for tol in 0.001 0.0001 0.00001
sed ....
done
done
done
It's not pretty but it works, and it saves me wrestling with much more sophisticated solutions such as xUnit testing or Python programming.

I suggest you read something about data-driven unit testing.
There are lots of frameworks that can help you with that.
You may start here: http://www.slideshare.net/dnastacio/datadriven-unit-testing-for-java-1933154.

I see that you work with FORTRAN and you probably deal with one of FORTRAN's versions of xUnit. Being user of JUnit I'd suggest parameterized tests - see if the concept applies in your case.

Is it possible to measure function coverage with gcov?

Currently we use gcov with our testing suite for Linux C++ application and it does a good job at measuring line coverage.
Can gcov produce function/method coverage report in addition to line coverage?
Looking at the parameters gcov accepts I do not think it is possible, but I may be missing something. Or, probably, is there any other tool that can produce function/method coverage report out of statistics generated by gcc?
Update: By function/method coverage I mean percentage of functions that get executed during tests.

I guess what you mean is the -f option, which will give you the percentage of lines covered per function. There is an interesting article about gcov at Dr. Dobb's which might be helpful. If "man gcov" doesn't show the -f flag, check if you have a reasobably recent version of the gcc suite.
Edit: to get the percentage of functions not executed you can simply parse through the function coverage output, as 0.00% coverage should be pretty much equivalent to not called. This small script prints the percentage of functions not executed:
#!/bin/bash
if test -z "$1"
then
echo "First argument must be function coverage file"
else
notExecuted=`cat $1 | grep "^0.00%" | wc -l`
executed=`cat $1 | grep -v "^0.00%" | wc -l`
percentage=$(echo "scale=2; $notExecuted / ($notExecuted + $executed) * 100" |bc)
echo $percentage
fi

We have started to use gcov and lcov together. The results from lcov do include the percentage of functions that are executed for the "module" you're looking at.
EDIT: The module can go from directories down to files.
I also want to add that if you are already using the GNU compiler tools, then gcov/lcov won't be too difficult for you to get running and the results it produces are very impressive.

The lcov utility is nice, and we use it. But I'm not sure if you need it for what you want.
We
Use ctags (wikipedia; sourceforge) to find all the functions declared in the relevant header files.
Run GCOV to get line coverage for every function in the binary.
Compare the list of functions from 1 & 2 to produce "Functions Called" / "Functions Available".
We call it "API coverage" since we apply step #1 only to public API headers. But you can do it on all headers or only a subset as you choose. I think the ratio we produce in this manner is the ratio you are looking for.

Resetting detection of source file changes

Sometimes I have to work on code that moves the computer clock forward. In this case some .cpp or .h files get their latest modification date set to the future time.
Later on, when my clock is fixed, and I compile my sources, system rebuilds most of the project because some of the latest modification dates are in the future. Each subsequent recompile has the same problem.
Solution that I know are:
a) Find the file that has the future time and re-save it. This method is not ideal because the project is very big and it takes time even for windows advanced search to find the files that are changed.
b) Delete the whole project and re-check it out from svn.
Does anyone know how I can get around this problem?
Is there perhaps a setting in visual studio that will allow me to tell the compiler to use the archive bit instead of the last modification date to detect source file changes?
Or perhaps there is a recursive modification date reset tool that can be used in this situation?

I would recommend using a virtual machine where you can mess with the clock to your heart's content and it won't affect your development machine. Two free ones are Virtual PC from Microsoft and VirtualBox from Sun.

If this was my problem, I'd look for ways to avoid mucking with the system time. Isolating the code under unit tests, or a virtual machine, or something.
However, because I love PowerShell:
Get-ChildItem -r . |
? { $_.LastWriteTime -gt ([DateTime]::Now) } |
Set-ItemProperty -Name "LastWriteTime" -Value ([DateTime]::Now)

I don't know if this works in your situation but how about you don't move your clock forward, but wrap your gettime method (or whatever you're using) and make it return the future time that you need?

Install Unix Utils
touch temp
find . -newer temp -exec touch {} ;
rm temp
Make sure to use the full path when calling find or it will probably use Windows' find.exe instead. This is untested in the Windows shell -- you might need to modify the syntax a bit.

I don't use windows - but surely there is something like awk or grep that you can use to find the "future" timestamped files, and then "touch" them so they have the right time - even a perl script.

1) Use a build system that doesn't use timestamps to detect modifications, like scons
2) Use ccache to speed up your build system that does use timestamps (and rebuild all).
In either case it is using md5sum's to verify that a file has been modified, not timestamps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js