What do you need from a test harness? - unit-testing

I'm one of the people involved in the Test Anything Protocol (TAP) IETF group (if interested, feel free to join the mailing list). Many programming languages are starting to adopt TAP as their primary testing protocol and they want more from it than what we currently offer. As a result, we'd like to get feedback from people who have a background in xUnit, TestNG or any other testing framework/methodology.
Basically, aside from a simple pass/fail, what information do you need from a test harness? Just to give you some examples:
Filename and line number (if applicable)
Start and end time
Diagnostic output such as the difference between what you got and what you expected.
And so on ...

Most definitely all things from your list for each individual item:
Filename
Line number
Namespace/class/function name
Test coverage
Start time and end time
And/or total time (this would be more useful for me than the top two items)
Diagnostic output such as the
difference between what you got and
what you expected.
From the top of my head not much else but for the group of tests I would like to know
group name
total execution time

It must be very, very easy to write a test, and equally easy to run them. That, to me, is the single most important feature of a testing harness. If someone has to fire up a GUI or jump through a bunch of hoops to write a test, they won't use it.

An arbitrary set of tags - so I can mark a test as, for example "integration, UI, admin".
(you knew I was going to ask for this didn't you :-)

To what you said I'd add:
Method/function/class name
Coverage counting tool, with exceptions (Do not count these methods)
Result of N last runs available
Mandate that ways to easily parse test results must exist

Any sort of diagnostic output - especially on failure is critical. If a test fails, you don't want to always have to rerun the test under a debugger to see what happened - there should be some cludes in the output.
I also like to see a before and after snapshot of critical system variables like memory or hard disk space available as those can provide great clues as well.
Finally, if you're using random seeds for any of the tests, write the seed out to the logfile so that the test can be reproduced if necessary.

I'd like the ability to concatenate and nest TAP streams.

A unique id (uuid, md5sum) to be able to identify an individual test -- say, for use when inserting test results in a database, or identifying them in a bug tracker to make it possible for QA to rerun an individual test.
This would also make it possible to trace an individual test's behavior from build-to-build through the entire lifecycle of multiple revisions of a product. This could eventually allow larger-scale correlations between 'historic' events (new hire, product release, hardware upgrades) and the profile(s) of tests that fail as a result of such events.
I'm also thinking that TAP should be emitted through a dedicated side-channel rather than mixed in with stdout. I'm not sure this is under the scope of the protocol definition.

I use TAP as output protocol for a set of simple C++ test methods, and have seen the following shortcomings:
test steps cannot be put into groups (there's only the grouping into several test scripts; but for running all tests in our software, I need at least one more level of grouping, so that a single test step would be identified by like "DB connection" -> "Reconnection Test" -> "test step #3")
seeing differences between expected and actual output is useful; I either print the diff to stderr (as comment) or actually launch a graphical diff tool
the protocol and tools must be really language-independent. For example, so far I only know of the Perl "prove" tool for running tests, which is limited to running Perl scripts
In the end, the test output must be suitable as basis for easily generating an HTML report file which lists succeeded tests very concisely, gives detailed output for failed tests, and makes it possible to quickly jump into the IDE to the failing test line.

optional ascii coloured output, green for good, yellow for pending, red for errors
the idea of things being pending
a summary at the end of the test report of commands that will run the individual tests where
List item
something went wrong
something in the test was pending

Extension idea for TAP:
1..4
ok 1 - yay
not ok 2 - boo
ok 3 - yay #json:{...}
ok 4 - see my json
Ability to attach a #json comment...
- can be safely ignored by existing code
- well-defined tags can be easily reserved at testanything.org
- easy to produce, parse and read complex types
- yaml is a pain

Related

How do syntax highlighting tools implement automated testing?

How do syntax highlighting tools such as pygments and textmate bundle do automated testing?
Tools like this often simply resort to a large collection of snippets of text representing a chosen input and the expected output. For instance if you look at the Pygments Github, you can see they have giant lists of text files divided into an input section and a tokens section like so:
---input---
f'{"quoted string"}'
---tokens---
'f' Literal.String.Affix
"'" Literal.String.Single
'{' Literal.String.Interpol
'"' Literal.String.Double
'quoted string' Literal.String.Double
'"' Literal.String.Double
'}' Literal.String.Interpol
"'" Literal.String.Single
'\n' Text
Since a highlighting tool reads a piece of code and then has to identify which bits of text are parts of which bits of code (is this the start of a function? is this a comment? is it a variable name?), they usually perform various processing steps that will result in a list of tokens as above, which they can then feed into the next step (insert highlights from the first Literal.String.Interpol to the next, bold any Literal.String.Single, etc. by generating the appropriate HTML or CSS or other markup relevant to the system). Checking that these tokens are generated properly from the input text is key.
Then, depending on the language the tool is built in you might use an existing testing suite or build your own (pygments seems to use a Python-based tool called pyTest), which essentially consists of running each of the inputs through your tool in a loop, reading the output, and comparing it to the expected values. If the output doesn't match, you can display a message showing what test failed, what the input/output/expected/error values were. If an output passes, you could simply signal with a happy green checkmark. Then when the test finishes, the developer can hopefully reason out what they broke by looking over the results.
It is often a good idea to randomize the order that these inputs so that you can be sure that each step in the test doesn't have side effects that are getting passed along to the next test and cause it to pass or fail incorrectly. It might also be a good idea to time the length of the complete test. If the whole thing was taking 12 seconds yesterday, but now it takes two minutes, we may have broken something even if all the test technically "pass".
In tools like a code highlighter, you often have a good idea of what many of the inputs and outputs will look like before you can code everything up, for instance if some spec document already exists. In that case, it may be a good idea to include tests that you know won't pass right away, but mark them with some tag (perhaps some text marker within the file that says "NOT PASSING", or naming the file in a certain way), and telling your testing suite to expect those tests to fail. Then, as you fix bugs and add features, say you fixed Bug X in your attempt to make test #144 pass. Now when you run the text, it also alerts you that 10 other tests that should be failing are now passing. Congrats! You just saved yourself a lot of work trying to fix several separate problems that were actually caused by the same root issue.
As the codebase is updated, a developer would run and rerun the test to ensure that any changes he makes doesn't break tests that were working before, and then would add new tests to the collection to verify that his new feature, fixed edge case, etc., now has a known expected output that you can be sure someone won't accidentally break in the future.

SAS Enterprise Guide - process dependence and parallel execution

I am working on a project in SAS EG (7.1) which involves process dependence and parallel execution, as depicted below:
I have the following questions:
Is there a way to retrieve or set relations (i.e. process_C --> program_D) between the processes programmatically? The maintenance is becoming problematic with complex projects. Ideally, I would like to be able to re-create the links between processes from external table.
I start the whole process with the option “Run branch from <>” process. Let’s assume that we have only 2 processors available. Is there a way to set the order of execution between process_A, B, C? The critical path of the whole flow is “begin -> process_C -> process_D -> end” hence we would like it to start with process_C in order to ensure minimum execution time.
Thank you in advance.
For 1, I think the answer is "no", if you mean a well defined SAS programmatic method. At least for the relatively limited information and example you provide above, anyway. More might be possible with metadata server - not my area of expertise.
You can do some of this at least using scripting via Powershell or VBScript. EG's API is fairly wide open and not all that hard to use. I won't suggest how as my understanding of this is limited also, but it seems like it should be possible to do what you suggest, though probably not easy.
For your second point:
First off, EG typically runs "top to bottom" if it has no other information on how to process a particular choice. So put c->d above a/b to get it processed first.
Second, you could use conditional processing perhaps. There should be a macro variable that tells you how many cpus you have (&SYSNCPU on my machine, hopefully same on other versions). You could use that value to conditionally link to A then B as opposed to A+B simultaneously. I'm not sure how easy this would be to do in a flexible fashion, though.

the best approaches for logging localization using c++

I am working on a multinational project where target audience for logs might be from two nationalities. Therefore it is becoming important to log in more than one language , I am thinking about writing to 2 different log folders based on language every time I am logging something, but I am also wondering if there's some out of the box functionality that is coming along with logging frameworks like log4cpp?
As other commenters have mentioned, it sounds like you are going down the wrong track by looking to do multilingual logging.
My recommendation would be to use English (which is the standard for technical information, and which I guess is the language you know best) and to make sure that the language you use is clear, grammatically correct and unambiguous. Then if one of the technicians cannot understand it, they can very easily and efficiently run it through a machine translation engine such as Google Translate. Or indeed they could process the logs and run everything through Google Translate to append translated text, particularly if you annotate the logs to mark the language content.
Assuming that the input language is well-written, machine transation usually gives a good result which the end user can understand. If the message isn't clear, has typos or abbreviations, then that's where machine translation fails spectacularly.
Writing log naturally brings down the speed of execution due to file open, seek and write operations involved as part of it.
This is one primary reason why many developer and architects suggest to write log at different levels.Increasing the depth of log entries as level increases to trace down the problems better. At higher level, you will notice that your process speed drops due to more log entries getting generated.
Rather suggest you to use services that can translate from one language to other.
I'm sure there are libraries free or paid which does this translation. You can create a small utility program that runs in the background and does this conversion during process idle time.
Well one suggestion is you can use a different process/thread which listens for your log messages, which you can log it from there ..
This reduces I/O logging time in your main process/thread and you can make all changes related to Logging language over there..
For multi - Lingual support I think you can try writing with widechar string .. though I am not sure..
the best approaches for logging localization using c++
Install Qt 4 and use QObject::tr/ tr() macro for strings. Write strings in whatever language you want. Hire/Get a translator to localize strings using QT Linguist.
Please note that perfect translation is impossible, so there will be many "amusing" misunderstandings, even if your translator is a genius. So it might be a better idea to select main language for programming team.
--EDIT--
Didn't notice this part before:
in more than one language
One way to approach it is to implement log reader. Instead of writing plaintext messages, you could dump message ids (generated by some kind of macros) and string arguments if strings are formatted. "Log reader" will allow user to select desired language while viewing log file, and translate messages based on their ids/arguments using mechanism similar to QTranslator. The good thing about this approach is that you'll be able to add more languages later - so it'll be possible to retranslate old logs. The bad thing is that this format will be harder to read for "normal human", although you can add plaintext messages in addition to message ids and arguments and you'll need to write log viewer.
Qt 4 has most of this framework implemented (there are routines for dumping variants into text/data streams, and so on) along with translation tool. See QTranslator documentation and Linguist manual for more info.

CPP unit setup for C++

In CPP unit we run unit test as part of build as part of post build setup. We will be running multiple tests as part of this. In case if any test case fails post build should not stop, it should go ahead and run all the test cases and should report summary how many test cases passed and failed. how can we achieve this.
Thanks!
His question is specific enough. You need a test runner. Encapsulate each test in its own behavior and class. The test project is contained separately from the tested code. Afterwards just configure your XMLOutputter. You can find an excellent example of how to do this in the linux website. http://www.yolinux.com/TUTORIALS/CppUnit.html
We use this way to compile our test projects for our main projects and observe if everything is ok. Now it all becomes the work of maintaining your test code.
Your question is too vague for a precise answer. Usually, a unit test engine return a code to tell it has failed (like a non zero return code in the shell on linux) or generate some output file with results. The calling system handle this. If you have written it (some home made scripts) you have to give the option to go on tests execution even if an error occurred. If you are using some tools like continuous integration server, then you have to go through the doc and find the option that allows you to go on when tests fails.
A workaround is to write a script that return a "OK" result even if the unit test fails, but there you lose some automatic verification ...
Be more specific if you want more clues.
my2c
I would just write your tests this way. Instead of using the CPPUNIT_ASSERT macros or whatever you would write them in regular C++ with some way of logging errors.
You could use a macro for this too of course. Something like:
LOGASSERT( some_expression )
could be defined to execute some_expression and to log the expression together with FILE and LINE if it fails, and you can also log exceptions of course, as well as ones that are not thrown, simply by writing them in your tests (with macros if you want to log the expression that caused them with FILE and LINE).
If you are writing macros I would advise you to limit the content of your macro to calling an inline function with extra parameters.

Suggestions for unit testing

Hello another question concerning debugging : Automatically generating test cases when i know the parameterset. And doing it all at once, instead during development (could kick myself)
i have a set of parameters for my software that i wish to test. (~ 12 parameters only). However of course these parameters are often integers, so for every parameter i can have 4 values that make sense(0, insanely huge, normally big, normally small).
is there a way i can generate my testcases automatically? would save me a lot of time. I already have to inspect every test case by hand, do i not? Alot of my program produces output to the console so normal assertions probably wont work, also i work on home made datastructures most of the time, so i could not use a simple assertion.
My dream option would be kind of a reverse regular expression, where i set the rules and get myself some file generated that i can use as an input (my software has a crude scripting language). that way i can assemble all input files and test them one by one.
Looking forward to listening to your kind suggestions.
cheers
There are lots of ways to generate test cases in your scenario -- though you're a bit vague on what form the inputs for your programs and units need to take. For one of my Fortran programs I use a template input parameter file, a bash script and a make file. The make file, when called on the test phony target:
a) compiles the program;
b) runs the bash script, which uses sed to replace placeholders in the template parameter file, to create 128 (or whatever) test input files;
c) submits all the test jobs to the job management system on our cluster.
Once they jobs have finished I have some other scripts to compare outputs with benchmarks, collect statistics, that sort of thing.
If you need more specific advice, post more specific questions.
EDIT: Using sed inside a bash script:
Suppose that the parameter input template file contains 3 codes to be replaced: $FREQ$, $NUM$ and $TOL$. Then I write a bash script with a 3-deep loop nest something like this:
for frq in 0.01 0.0 1 10
do
for np in 1 2 4 8 16
do
for tol in 0.001 0.0001 0.00001
sed ....
done
done
done
It's not pretty but it works, and it saves me wrestling with much more sophisticated solutions such as xUnit testing or Python programming.
I suggest you read something about data-driven unit testing.
There are lots of frameworks that can help you with that.
You may start here: http://www.slideshare.net/dnastacio/datadriven-unit-testing-for-java-1933154.
I see that you work with FORTRAN and you probably deal with one of FORTRAN's versions of xUnit. Being user of JUnit I'd suggest parameterized tests - see if the concept applies in your case.