Hello another question concerning debugging : Automatically generating test cases when i know the parameterset. And doing it all at once, instead during development (could kick myself)
i have a set of parameters for my software that i wish to test. (~ 12 parameters only). However of course these parameters are often integers, so for every parameter i can have 4 values that make sense(0, insanely huge, normally big, normally small).
is there a way i can generate my testcases automatically? would save me a lot of time. I already have to inspect every test case by hand, do i not? Alot of my program produces output to the console so normal assertions probably wont work, also i work on home made datastructures most of the time, so i could not use a simple assertion.
My dream option would be kind of a reverse regular expression, where i set the rules and get myself some file generated that i can use as an input (my software has a crude scripting language). that way i can assemble all input files and test them one by one.
Looking forward to listening to your kind suggestions.
cheers
There are lots of ways to generate test cases in your scenario -- though you're a bit vague on what form the inputs for your programs and units need to take. For one of my Fortran programs I use a template input parameter file, a bash script and a make file. The make file, when called on the test phony target:
a) compiles the program;
b) runs the bash script, which uses sed to replace placeholders in the template parameter file, to create 128 (or whatever) test input files;
c) submits all the test jobs to the job management system on our cluster.
Once they jobs have finished I have some other scripts to compare outputs with benchmarks, collect statistics, that sort of thing.
If you need more specific advice, post more specific questions.
EDIT: Using sed inside a bash script:
Suppose that the parameter input template file contains 3 codes to be replaced: $FREQ$, $NUM$ and $TOL$. Then I write a bash script with a 3-deep loop nest something like this:
for frq in 0.01 0.0 1 10
do
for np in 1 2 4 8 16
do
for tol in 0.001 0.0001 0.00001
sed ....
done
done
done
It's not pretty but it works, and it saves me wrestling with much more sophisticated solutions such as xUnit testing or Python programming.
I suggest you read something about data-driven unit testing.
There are lots of frameworks that can help you with that.
You may start here: http://www.slideshare.net/dnastacio/datadriven-unit-testing-for-java-1933154.
I see that you work with FORTRAN and you probably deal with one of FORTRAN's versions of xUnit. Being user of JUnit I'd suggest parameterized tests - see if the concept applies in your case.
Related
Let's say I have a list that has about 210000 english words.
I need to use all these 210000 words as test case.
I need to make sure every words in that list is covered every time I run my test.
The question is: What is the best practices to store these words in my test?
should I save all these words in a slice (will it be too large a slice? ), or should I save these words in a external file (like words.txt) and load the file line by line when needed?
Test data is usually stored in a directory named testdata to keep it separate from the other source code or data files (see the docs from the command go help test). The go tool ignores stuff inside that directory.
210,000 words should take up only single digit megabytes of RAM anyway, which isn't much. Just have a helper function that reads the words from the file before each test (perhaps caching them), or define a TestMain() function which reads them once and stores them in a global variable for access by tests that are subsequently run.
Edit: Regarding best practices, it's sometimes nicer to store test data in testdata even if the data isn't large. For example, I sometimes need to use multiple short JSON snippets in test cases, and perhaps use them more than once. Storing them in appropriately named files under a subdirectory of testdata can be more readable than littering Go code with a bunch of JSON snippets.
The slight loss of performance is generally not an issue in tests. Whichever method makes the code easier to understand could be the 'best practice'.
How do syntax highlighting tools such as pygments and textmate bundle do automated testing?
Tools like this often simply resort to a large collection of snippets of text representing a chosen input and the expected output. For instance if you look at the Pygments Github, you can see they have giant lists of text files divided into an input section and a tokens section like so:
---input---
f'{"quoted string"}'
---tokens---
'f' Literal.String.Affix
"'" Literal.String.Single
'{' Literal.String.Interpol
'"' Literal.String.Double
'quoted string' Literal.String.Double
'"' Literal.String.Double
'}' Literal.String.Interpol
"'" Literal.String.Single
'\n' Text
Since a highlighting tool reads a piece of code and then has to identify which bits of text are parts of which bits of code (is this the start of a function? is this a comment? is it a variable name?), they usually perform various processing steps that will result in a list of tokens as above, which they can then feed into the next step (insert highlights from the first Literal.String.Interpol to the next, bold any Literal.String.Single, etc. by generating the appropriate HTML or CSS or other markup relevant to the system). Checking that these tokens are generated properly from the input text is key.
Then, depending on the language the tool is built in you might use an existing testing suite or build your own (pygments seems to use a Python-based tool called pyTest), which essentially consists of running each of the inputs through your tool in a loop, reading the output, and comparing it to the expected values. If the output doesn't match, you can display a message showing what test failed, what the input/output/expected/error values were. If an output passes, you could simply signal with a happy green checkmark. Then when the test finishes, the developer can hopefully reason out what they broke by looking over the results.
It is often a good idea to randomize the order that these inputs so that you can be sure that each step in the test doesn't have side effects that are getting passed along to the next test and cause it to pass or fail incorrectly. It might also be a good idea to time the length of the complete test. If the whole thing was taking 12 seconds yesterday, but now it takes two minutes, we may have broken something even if all the test technically "pass".
In tools like a code highlighter, you often have a good idea of what many of the inputs and outputs will look like before you can code everything up, for instance if some spec document already exists. In that case, it may be a good idea to include tests that you know won't pass right away, but mark them with some tag (perhaps some text marker within the file that says "NOT PASSING", or naming the file in a certain way), and telling your testing suite to expect those tests to fail. Then, as you fix bugs and add features, say you fixed Bug X in your attempt to make test #144 pass. Now when you run the text, it also alerts you that 10 other tests that should be failing are now passing. Congrats! You just saved yourself a lot of work trying to fix several separate problems that were actually caused by the same root issue.
As the codebase is updated, a developer would run and rerun the test to ensure that any changes he makes doesn't break tests that were working before, and then would add new tests to the collection to verify that his new feature, fixed edge case, etc., now has a known expected output that you can be sure someone won't accidentally break in the future.
I've inherited an old program that works, but has pretty ugly source code. I need to make the code nicer without changing the functionality. This program takes an input file, performs all sorts of calculations and generates an output file.
The program is currently written in a C/C++ combination. At first I'm going to keep it a C++ program, but in the not-so-far future I'm going to convert it, or parts of it, to Python.
Naturally, the original developers haven't taken the time to create unit tests, or any other kind of test. Since I want to make sure my modifications haven't changed the program's behavior, I want to start by creating some tests. These will not be unit tests, but rather tests of the entire program.
I want each test to take one input file and a set of command line arguments, run the program and compare the output (which is the output file, stdout output and stderr output) to the expected output.
Since I need to support both C++ and Python, the test framework needs to be language agnostic - it should be able to run an executable, collect stdout and stderr and compare them, as well as another file, to the prerecorded outputs.
I couldn't find a test framework that can do that. Is there anything like that? I'd rather not develop one myself.
Well, off the top of my head, you could certainly run the executable with your desired inputs in python using subprocess or some similar module, parse the output and then use the unittest module to set expectations on what sort of output you're looking for.
In CPP unit we run unit test as part of build as part of post build setup. We will be running multiple tests as part of this. In case if any test case fails post build should not stop, it should go ahead and run all the test cases and should report summary how many test cases passed and failed. how can we achieve this.
Thanks!
His question is specific enough. You need a test runner. Encapsulate each test in its own behavior and class. The test project is contained separately from the tested code. Afterwards just configure your XMLOutputter. You can find an excellent example of how to do this in the linux website. http://www.yolinux.com/TUTORIALS/CppUnit.html
We use this way to compile our test projects for our main projects and observe if everything is ok. Now it all becomes the work of maintaining your test code.
Your question is too vague for a precise answer. Usually, a unit test engine return a code to tell it has failed (like a non zero return code in the shell on linux) or generate some output file with results. The calling system handle this. If you have written it (some home made scripts) you have to give the option to go on tests execution even if an error occurred. If you are using some tools like continuous integration server, then you have to go through the doc and find the option that allows you to go on when tests fails.
A workaround is to write a script that return a "OK" result even if the unit test fails, but there you lose some automatic verification ...
Be more specific if you want more clues.
my2c
I would just write your tests this way. Instead of using the CPPUNIT_ASSERT macros or whatever you would write them in regular C++ with some way of logging errors.
You could use a macro for this too of course. Something like:
LOGASSERT( some_expression )
could be defined to execute some_expression and to log the expression together with FILE and LINE if it fails, and you can also log exceptions of course, as well as ones that are not thrown, simply by writing them in your tests (with macros if you want to log the expression that caused them with FILE and LINE).
If you are writing macros I would advise you to limit the content of your macro to calling an inline function with extra parameters.
I'm one of the people involved in the Test Anything Protocol (TAP) IETF group (if interested, feel free to join the mailing list). Many programming languages are starting to adopt TAP as their primary testing protocol and they want more from it than what we currently offer. As a result, we'd like to get feedback from people who have a background in xUnit, TestNG or any other testing framework/methodology.
Basically, aside from a simple pass/fail, what information do you need from a test harness? Just to give you some examples:
Filename and line number (if applicable)
Start and end time
Diagnostic output such as the difference between what you got and what you expected.
And so on ...
Most definitely all things from your list for each individual item:
Filename
Line number
Namespace/class/function name
Test coverage
Start time and end time
And/or total time (this would be more useful for me than the top two items)
Diagnostic output such as the
difference between what you got and
what you expected.
From the top of my head not much else but for the group of tests I would like to know
group name
total execution time
It must be very, very easy to write a test, and equally easy to run them. That, to me, is the single most important feature of a testing harness. If someone has to fire up a GUI or jump through a bunch of hoops to write a test, they won't use it.
An arbitrary set of tags - so I can mark a test as, for example "integration, UI, admin".
(you knew I was going to ask for this didn't you :-)
To what you said I'd add:
Method/function/class name
Coverage counting tool, with exceptions (Do not count these methods)
Result of N last runs available
Mandate that ways to easily parse test results must exist
Any sort of diagnostic output - especially on failure is critical. If a test fails, you don't want to always have to rerun the test under a debugger to see what happened - there should be some cludes in the output.
I also like to see a before and after snapshot of critical system variables like memory or hard disk space available as those can provide great clues as well.
Finally, if you're using random seeds for any of the tests, write the seed out to the logfile so that the test can be reproduced if necessary.
I'd like the ability to concatenate and nest TAP streams.
A unique id (uuid, md5sum) to be able to identify an individual test -- say, for use when inserting test results in a database, or identifying them in a bug tracker to make it possible for QA to rerun an individual test.
This would also make it possible to trace an individual test's behavior from build-to-build through the entire lifecycle of multiple revisions of a product. This could eventually allow larger-scale correlations between 'historic' events (new hire, product release, hardware upgrades) and the profile(s) of tests that fail as a result of such events.
I'm also thinking that TAP should be emitted through a dedicated side-channel rather than mixed in with stdout. I'm not sure this is under the scope of the protocol definition.
I use TAP as output protocol for a set of simple C++ test methods, and have seen the following shortcomings:
test steps cannot be put into groups (there's only the grouping into several test scripts; but for running all tests in our software, I need at least one more level of grouping, so that a single test step would be identified by like "DB connection" -> "Reconnection Test" -> "test step #3")
seeing differences between expected and actual output is useful; I either print the diff to stderr (as comment) or actually launch a graphical diff tool
the protocol and tools must be really language-independent. For example, so far I only know of the Perl "prove" tool for running tests, which is limited to running Perl scripts
In the end, the test output must be suitable as basis for easily generating an HTML report file which lists succeeded tests very concisely, gives detailed output for failed tests, and makes it possible to quickly jump into the IDE to the failing test line.
optional ascii coloured output, green for good, yellow for pending, red for errors
the idea of things being pending
a summary at the end of the test report of commands that will run the individual tests where
List item
something went wrong
something in the test was pending
Extension idea for TAP:
1..4
ok 1 - yay
not ok 2 - boo
ok 3 - yay #json:{...}
ok 4 - see my json
Ability to attach a #json comment...
- can be safely ignored by existing code
- well-defined tags can be easily reserved at testanything.org
- easy to produce, parse and read complex types
- yaml is a pain