What is the purpose of unit testing? - c++

For learning purposes, I dived into the field of unit testing. I've read through a few tutorials concerning this issue using QT, and came up with the following:
class QMyUnitTest : public QObject
{
Q_OBJECT
private:
bool isPrime(unsigned int ui);
private Q_SLOTS:
void myTest();
};
bool QMyUnitTest::isPrime(unsigned int n) {
typedef std::map<unsigned int, bool> Filter;
Filter filter;
for(unsigned int ui = 2; ui <= n; ui++) {
filter[ui] = true;
}
unsigned int ui = filter.begin()->first;
for(Filter::iterator it = filter.begin();
it != filter.end(); it++) {
if(it->second) {
for(unsigned int uj = ui * ui; uj <= n; uj += ui) {
filter[uj] = false;
}
}
ui++;
}
return filter[n];
}
void QMyUnitTest::myTest() {
}
QTEST_MAIN(QMyUnitTest)
#include "tst_myunittest.moc"
I know my prime finding algorithm is inefficient and especially flawed; it is meant this way. Now I want to test it thoroughly, but the following question arose:
To test properly, do I not have to have a very precise idea of what could go wrong?
Of course I can run through the first 1000 prime numbers and check if they come out true or 1000 not prime numbers and check if they come out false, but that might not catch the flaws in the algorithm (for example: return filter[n]; is obviously horrible as filter[n] might not exist if n<2).
What use is unit testing if I already have to know what potential problems of my function are?
Am I doing something wrong? Is there a better way to test?

The purpose of unit testing is to verify that the code you wrote actually does what it is supposed to do.
To write correct and complete tests, you need to know precisely what your code is supposed to do. You then write tests to verify:
normal conditions (i.e. "normal", "usual" inputs produce the expected output)
border conditions (i.e. input that is just barely within spec)
failure conditions (i.e. you verify that your code fails correctly when the input is invalid).
Your example of testing the behavior of your isPrime routine with a numbers it doesn't handle is a good one. Without knowing your implementation, 0, 1, 2 and negative values would be good testcases for an isPrime routine - they are things you might not think about when implementing your algorithm, so they are valuable tests.
Note that verifying normal conditions isn't necessarily the easiest part. As in this case, making sure your algorithm is perfect requires a mathematical analysis, then a verification that your code implements that correctly - and this is sometimes hard. Checking a few hundred known values is not necessarily enough (it might start failing at the 101st value).
What use is unit testing if I already have to know what potential problems of my function are?
You've got it reversed. Don't write your unit tests with your implementation code in mind. Write it with the specification in mind. Your code must fit the spec, and your tests must ensure that as much as they can. (Code coverage tools will help you find border conditions once the bulk of your testing is done.)

Just wanna add to what Mat already listed:
Unit test will ensure that you and others who work on your code in the future won't change the behavior of your code by accident.
If written correctly, Unit tests will be a great way to document the behavior of your code.

Related

Unit testing cyclomatically complicated but otherwise trivial calculations

Let's say I have a calculator class who primary function is to do the following (this code is simplified to make the discussion easier, please don't comment on the style of it)
double pilingCarpetArea = (hardstandingsRequireRemediation = true) ? hardStandingPerTurbineDimensionA * hardStandingPerTurbineDimensionB * numberOfHardstandings * proportionOfHardstandingsRequiringGroundRemediationWorks : 0;
double trackCostMultipler;
if (trackConstructionType = TrackConstructionType.Easy) trackCostMultipler = 0.8
else if (trackConstructionType = TrackConstructionType.Normal) trackCostMultipler = 1
else if (trackConstructionType = TrackConstructionType.Hard) trackCostMultipler = 1.3
else throw new OutOfRangeException("Unknown TrackConstructionType: " + trackConstructionType.ToString());
double PilingCostPerArea = TrackCostPerMeter / referenceTrackWidth * trackCostMultipler;
There are at least 7 routes through this class I should probably test, the combination of trackCostMultiplier and hardstandingsRequireRemediation (6 combinations) and the exception condition. I might also want to add some for divide by zero and overflow and suchlike if I was feeling keen.
So far so good, I can test this number of combinations easily and stylishly. And actually I might trust that multiplication and addition are unlikely to go wrong, and so just have 3 tests for trackCostMultipler and 2 for hardstandingsRequireRemediation, instead of testing all possible combinations.
However, this is a simple case, and the logic in our apps is unfortunately cyclomatically much more complicated than this, so the number of tests could grow huge.
There are some ways to tackle this complexity
Extract the trackCostMultipler calculation to a method in the same class
This is a good thing to do, but it doesn't help me test it unless I make this method public, which is a form of "Test Logic In Production". I often do this in the name of pragmatism, but I would like to avoid if I can.
Defer the trackCostMultipler calculation to a different class
This seems like a good thing to do if the calculation is sufficiently complex, and I can test this new class easily. However I have just made the testing of the original class more complicated, as I will now want to pass in a ITrackCostMultipler "Test Double" of some sort, check that it gets called with the right parameters, and check that its return value is used correctly. When a class has, say, ten sub calculators, its unit / integration test becomes very large and difficult to understand.
I use both (1) and (2), and they give me confidence and they make debugging a lot quicker. However there are definitely downsides, such as Test Logic in Production and Obscure Tests.
I am wondering what others experiences of testing cyclomatically complicated code are? Is there a way of doing this without the downsides? I realise that Test Specific Subclasses can work around (1), but this seems like a legacy technique to me. It is also possible to manipulate the inputs so that various parts of the calculation return 0 (for addition or subtraction) or 1 (for multiplication or division) to make testing easier, but this only gets me so far.
Thanks
Cedd
Continuing the discussion from the comments to the OP, if you have referentially transparent functions, you can first test each small part by itself, and then combine them and test that the combination is correct.
Since constituent functions are referentially transparent, they are logically interchangeable with their return values. Now the only remaining step would be to prove that the overall function correctly composes the individual functions.
The is a great fit for property-based testing.
As an example, assume that you have two parts of a complex calculation:
module MyCalculations =
let complexPart1 x y = x + y // Imagine it's more complex
let complexPart2 x y = x - y // Imagine it's more complex
Both of these functions are deterministic, so assuming that you really want to test a facade function that composes these two functions, you can define this property:
open FsCheck.Xunit
open Swensen.Unquote
open MyCalculations
[<Property>]
let facadeReturnsCorrectResult (x : int) (y : int) =
let actual = facade x y
let expected = (x, y) ||> complexPart1 |> complexPart2 x
expected =! actual
Like other property-based testing frameworks, FsCheck will throw lots of randomly generated values at facadeReturnsCorrectResult (100 times, by default).
Given that both complexPart1 and complexPart2 are deterministic, but you don't know what x and y are, the only way to pass the test is to implement the function correctly:
let facade x y =
let intermediateResult = complexPart1 x y
complexPart2 x intermediateResult
You need another abstraction level to make your methods simpler, so it will be easier to test them:
doStuff(trackConstructionType, referenceTrackWidth){
...
trackCostMultipler = countTrackCostMultipler(trackConstructionType)
countPilingCostPerArea = countPilingCostPerArea(referenceTrackWidth, trackCostMultipler)
...
}
countTrackCostMultipler(trackConstructionType){
double trackCostMultipler;
if (trackConstructionType = TrackConstructionType.Easy) trackCostMultipler = 0.8
else if (trackConstructionType = TrackConstructionType.Normal) trackCostMultipler = 1
else if (trackConstructionType = TrackConstructionType.Hard) trackCostMultipler = 1.3
else throw new OutOfRangeException("Unknown TrackConstructionType: " + trackConstructionType.ToString());
return trackCostMultipler;
}
countPilingCostPerArea(referenceTrackWidth, trackCostMultipler){
return TrackCostPerMeter / referenceTrackWidth * trackCostMultipler;
}
Sorry for the code, I don't know the language, does not really matter...
If you don't want to make these methods public, then you have to move them to a separate class, and make them public there. The class name could be TrackCostMultiplerAlgorithm or ..Logic or ..Counter, or something like that. So you will be able to inject the algorithm into the higher abstraction level code if you'll have more different algorithms. Everything depends on the actual code.
Ohh and don't worry about the method and class lengths, if you really need a new method or class, because the code is too complex, then create one! Does not matter that it will be short. It will be always ease understanding as well, because you can write into the method name what it does. The code block inside the method only tells us how it does...

Testing a function: what more should be tested?

I am writing a function that takes three integer inputs and based on a relation between the three, it returns a value or error. To test this, I have written some test cases which include testing illegal values, boundary conditions for integers including overflows and some positive tests too. I am wondering what else should be tested for this simple function?
Can testing on different platforms make sense as a test case for such a small function?
Also, testing execution times is another thing that I wanted to add as a test case.
Can doing static and dynamic analysis be a part of the test cases?
Anything else that should be tested?
int foo(int a, int b, int c) {
return a value based on a, b, and c.
}
The way you ask your question it seems you are doing a black box test, i.e. you only know about the relation between input and output, and not about the implementation. In that case your test case should depend on what you know about the relation, and I think you have tested these things (you didn't give us details on the relation).
From that it doesn't look as if you need to test for platform independence, but if you have an automated test suite, it is for sure not a bad idea to test it on different platforms.
Now if you have the code available, you could go for white box tests. Typically you would do this by looking at your code structure first, i.e. you could try to have 100% branching coverage, i.e. every branch in your code is at least run once during the tests. In that way, static and dynamic analysis could help you to find different coverage measures.
I wouldn't go for a platform independency test if there is no platform dependent code in your function.
sizeof(int) must be tested for the particular compiler. Although this seems trivial and C standard specifies the size for an int, its always better to know if the compiler being used is a 16 bit standard-noncomformant compiler. Just another test case.

Is data-driven testing bad?

I've started using googletest to implement tests and stumbled across this quote in the documentation regarding value-parameterized tests
You want to test your code over various inputs (a.k.a. data-driven
testing). This feature is easy to abuse, so please exercise your good
sense when doing it!
I think I'm indeed "abusing" the system when doing the following and would like to hear your input and opinions on this matter.
Assume we have the following code:
template<typename T>
struct SumMethod {
T op(T x, T y) { return x + y; }
};
// optimized function to handle different input array sizes
// in the most efficient way
template<typename T, class Method>
T f(T input[], int size) {
Method m;
T result = (T) 0;
if(size <= 128) {
// use m.op() to compute result etc.
return result;
}
if(size <= 256) {
// use m.op() to compute result etc.
return result;
}
// ...
}
// naive and correct, but slow alternative implementation of f()
template<typename T, class Method>
T f_alt(T input[], int size);
Ok, so with this code, it certainly makes sense to test f() (by comparison with f_alt()) with different input array sizes of randomly generated data to test the correctness of branches. On top of that, I have several structs like SumMethod, MultiplyMethod, etc, so I'm running quite a large number of tests also for different types:
typedef MultiplyMethod<int> MultInt;
typedef SumMethod<int> SumInt;
typedef MultiplyMethod<float> MultFlt;
// ...
ASSERT(f<int, MultInt>(int_in, 128), f_alt<int, MultInt>(int_in, 128));
ASSERT(f<int, MultInt>(int_in, 256), f_alt<int, MultInt>(int_in, 256));
// ...
ASSERT(f<int, SumInt>(int_in, 128), f_alt<int, SumInt>(int_in, 128));
ASSERT(f<int, SumInt>(int_in, 256), f_alt<int, SumInt>(int_in, 256));
// ...
const float ep = 1e-6;
ASSERT_NEAR(f<float, MultFlt>(flt_in, 128), f_alt<float, MultFlt>(flt_in, 128), ep);
ASSERT_NEAR(f<float, MultFlt>(flt_in, 256), f_alt<float, MultFlt>(flt_in, 256), ep);
// ...
Now of course my question is: does this make any sense and why would this be bad?
In fact, I have found a "bug" when running tests with floats where f() and f_alt() would give different values with SumMethod due to rounding, which I could improve by presorting the input array etc.. From this experience I consider this actually somewhat good practice.
I think the main problem is testing with "randomly generated data". It is not clear from your question whether this data is re-generated each time your test harness is run. If it is, then your test results are not reproducible. If some test fails, it should fail every time you run it, not once in a blue moon, upon some weird random test data combination.
So in my opinion you should pre-generate your test data and keep it as a part of your test suite. You also need to ensure that the dataset is large enough and diverse enough to offer sufficient code coverage.
Moreover, As Ben Voigt commented below, testing with random data only is not enough. You need to identify corner cases in your algorithms and test them separately, with data tailored specifically for these cases. However, in my opinion, additional testing with random data is also beneficial when/if you are not sure that you know all your corner cases. You may hit them by chance using random data.
The problem is that you can't assert correctness on floats the same way you do ints.
Check correctness within a certain epsilon, which is a small difference between the calculated and expected values. That's the best you can do. This is true for all floating point numbers.
I think I'm indeed "abusing" the system when doing the following
Did you think this was bad before you read that article? Can you articulate what's bad about it?
You have to test this functionality sometime. You need data to do it. Where's the abuse?
One of the reasons why it could be bad is that data driven tests are harder to maintain and in longer period of time it's easier to introduce bugs in tests itself.
For details look here: http://googletesting.blogspot.com/2008/09/tott-data-driven-traps.html
Also from my point of view unittests are the most useful when you are doing serious refactoring and you are not sure if you didn't changed the logic in wrong way.
If your random-data test will fail after that kind of changes, then you can guess: is it because of data or because of your changes?
However, I think it could be useful (same as stress tests which also are not 100% reproducible). But if you are using some continuous integration system, I'm not sure if data-driven tests with huge amount of random generated data should be included into it.
I would rather make separate deployment which periodically make a lot of random tests at once (so the chance of discovering something bad should be quite high every time when you run it). But it's too resource heavy as the part of normal tests suite.

should unit tests be black box tests or white box tests?

Say I have three methods, all very similar but with different input types:
void printLargestNumber(int a, int b) { ... }
void printLargestNumber(double a, double b) { ... }
void printLargestNumber(String numberAsString, String numberAsString) { ... }
All three use the same underlying logic. For example: maybe the double version is the only one that compares numbers, and the other two just convert their inputs to double.
We could imagine a few different unit tests: first input is larger, second is larger, both inputs are negative, etc.
My Question
Should all three methods have the full set of tests (black box since we don't assume the core implementation is the same)
or
Should only the double version be tested heavily and the other two tested lightly to verify parameter conversion (white box testing since we know they share the same implementation and it's already been tested in the double tests)?
If all of those methods are public, i.e. callable by the outside world, I'd definitely test all of them with a full set of tests. One good reason is that white-box tests are more brittle than black-box tests; if the implementation changes the public contract might change for some of those methods.
There are a set of tests that explicitly exercise the public interfaces. I would treat those as black-box tests.
There are a second set of tests that could be seen as looking at the corner cases of the implementation. This is white box testing and surely has a place in a Unit test. You can't know the interesting paths without some white-box implementation knowledge. I would pay particular attention to the String case, because the interface allows for strings that may not convert cleanly to doubles, that push the boundaries of precision etc.
Would I cut a few corners in the integer case? I know I pushed the paths in the double case, probably shouldn't but might well under time pressure.
It depends.
Do you think the implementation is likely to change? If so then go with black box testing.
If you can guarantee that the implementation won't change go with white box. However, the chances of you being able to guarantee this aren't 100%.
You could compromise and do some of the black box tests, particularly around the boundary conditions. However, writing the tests should be easy - so there's no excuse from that point of view for not doing full black box testing. The only limiting factor is the time it takes to run the tests.
Perhaps you should investigate the possibility of running the tests in parallel.

What type of errors could my code still contain even if I have 100% code coverage?

What type of errors could my code still contain even if I have 100% code coverage? I'm looking for concrete examples or links to concrete examples of such errors.
Having 100% code coverage is not that great as one may think of it. Consider a trival example:
double Foo(double a, double b)
{
return a / b;
}
Even a single unit test will raise code coverage of this method to 100%, but the said unit test will not tell us what code is working and what code is not. This might be a perfectly valid code, but without testing edge conditions (such as when b is 0.0) unit test is inconclusive at best.
Code coverage only tells us what was executed by our unit tests, not whether it was executed correctly. This is an important distinction to make. Just because a line of code is executed by a unit test, does not necessarily mean that that line of code is working as intended.
Listen to this for an interesting discussion.
Code coverage doesn't mean that your code is bug free in any way. It's an estimate on how well you're test cases cover your source code base. 100% code coverage would imply that every line of code is tested but every state of your program certainly is not. There's research being done in this area, I think it's referred to as finite state modeling but it's really a brute force way of trying to explore every state of a program.
A more elegant way of doing the same thing is something referred to as abstract interpretation. MSR (Microsoft Research) have released something called CodeContracts based on abstract interpretation. Check out Pex as well, they really emphasis cleaver methods of testing application run-time behavior.
I could write a really good test which would give me good coverage, but there's no guarantees that that test will explore all the states that my program might have. This is the problem of writing really good tests, which is hard.
Code coverage does not imply good tests
Uh? Any kind of ordinary logic bug, I guess? Memory corruption, buffer overrun, plain old wrong code, assignment-instead-of-test, the list goes on. Coverage is only that, it lets you know that all code paths are executed, not that they are correct.
As I haven't seen it mentioned yet, I'd like to add this thread that code coverage does not tell you what part of your code is bugfree.
It only tells you what parts of your code is guaranteed to be untested.
1. "Data space" problems
Your (bad) code:
void f(int n, int increment)
{
while(n < 500)
{
cout << n;
n += increment;
}
}
Your test:
f(200,100);
Bug in real world use:
f(200,0);
My point: Your test may cover 100% of the lines of your code but it will not (typically) cover all your possible input data space, i.e. the set of all possible values of inputs.
2. Testing against your own mistake
Another classical example is when you just take a bad decision in design, and test your code against your own bad decision.
E.g. The specs document says "print all prime numbers up to n" and you print all prime numbers up to n but excluding n. And your unit tests test your wrong idea.
3. Undefined behaviour
Use the value of uninitialized variables, cause an invalid memory access, etc. and your code has undefined behaviour (in C++ or any other language that contemplates "undefined behaviour"). Sometimes it will pass your tests, but it will crash in the real world.
...
There can always be runtime exceptions: memory filling up, database or other connections not being closed etc...
Consider the following code:
int add(int a, int b)
{
return a + b;
}
This code could fail to implement some necessary functionality (i.e. not meet an end-user requirements): "100% coverage" doesn't necessarily test/detect functionality which ought to be implemented but which isn't.
This code could work for some but not all input data ranges (e.g. when a and b are both very large).
Code coverage doesn't mean anything, if your tests contain bugs, or you are testing the wrong thing.
As a related tangent; I'd like to remind to you that I can trivially construct an O(1) method that satisfies the following pseudo-code test:
sorted = sort(2,1,6,4,3,1,6,2);
for element in sorted {
if (is_defined(previousElement)) {
assert(element >= previousElement);
}
previousElement = element;
}
bonus karma to Jon Skeet, who pointed out the loophole I was thinking about
Code coverage usually only tells you how many of the branches within a function are covered. It doesn't usually report the various paths that could be taken between function calls. Many errors in programs happen because the handoff from one method to another is wrong, not because the methods themselves contain errors. All bugs of this form could still exist in 100% code coverage.
In a recent IEEE Software paper "Two Mistakes and Error-Free Software: A Confession", Robert Glass argued that in the "real world" there are more bugs caused by what he calls missing logic or combinatorics (which can't be guarded against with code coverage tools) than by logic errors (which can).
In other words, even with 100% code coverage you still run the risk of encountering these kinds of errors. And the best thing you can do is--you guessed it--do more code reviews.
The reference to the paper is here and I found a rough summary here.
works on my machine
Many things work well on local machine and we cannot assure that to work on Staging/Production. Code Coverage may not cover this.
Errors in tests :)
Well if your tests don't test the thing which happens in the code covered.
If you have this method which adds a number to properties for example:
public void AddTo(int i)
{
NumberA += i;
NumberB -= i;
}
If your test only checks the NumberA property, but not NumberB, then you will have 100% coverage, the test passes, but NumberB will still contain an error.
Conclusion: a unit test with 100% will not guarantee that the code is bug-free.
Argument validation, aka. Null Checks. If you take any external inputs and pass them into functions but never check if they are valid/null, then you can achieve 100% coverage, but you will still get a NullReferenceException if you somehow pass null into the function because that's what your database gives you.
also, arithmetic overflow, like
int result = int.MAXVALUE + int.MAXVALUE;
Code Coverage only covers existing code, it will not be able to point out where you should add more code.
I don't know about anyone else, but we don't get anywhere near 100% coverage. None of our "This should never happen" CATCHes get exercised in our tests (well, sometimes they do, but then the code gets fixed so they don't any more!). I'm afraid I don't worry that there might be a Syntax/Logic error in a never-happen-CATCH
Your product might be technically correct, but not fulfil the needs of the customer.
FYI, Microsoft Pex attempts to help out by exploring your code and finding "edge" cases, like divide by zero, overflow, etc.
This tool is part of VS2010, though you can install a tech preview version in VS2008. It's pretty remarkable that the tool finds the stuff it finds, though, IME, it's still not going to get you all the way to "bulletproof".
Code Coverage doesn't mean much. What matters is whether all (or most) of the argument values that affect the behavior are covered.
For eg consider a typical compareTo method (in java, but applies in most languages):
//Return Negative, 0 or positive depending on x is <, = or > y
int compareTo(int x, int y) {
return x-y;
}
As long as you have a test for compareTo(0,0), you get code coverage. However, you need at least 3 testcases here (for the 3 outcomes). Still it is not bug free. It also pays to add tests to cover exceptional/error conditions. In the above case, If you try compareTo(10, Integer.MAX_INT), it is going to fail.
Bottomline: Try to partition your input to disjoint sets based on behavior, have a test for at least one input from each set. This will add more coverage in true sense.
Also check for tools like QuickCheck (If available for your language).
Perform a 100% code coverage, i.e., 100% instructions, 100% input and output domains, 100% paths, 100% whatever you think of, and you still may have bugs in your code: missing features.
As mentioned in many of the answers here, you could have 100% code coverage and still have bugs.
On top of that, you can have 0 bugs but the logic in your code may be incorrect (not matching the requirements). Code coverage, or being 100% bug-free can't help you with that at all.
A typical corporate software development practice could be as follows:
Have a clearly written functional specification
Have a test plan that's written against (1) and have it peer reviewed
Have test cases written against (2) and have them peer reviewed
Write code against the functional specification and have it peer reviewed
Test your code against the test cases
Do code coverage analysis and write more test cases to achieve good coverage.
Note that I said "good" and not "100%". 100% coverage may not always be feasible to achieve -- in which case your energy is best spent on achieving correctness of code, rather than the coverage of some obscure branches. Different sorts of things can go wrong in any of the steps 1 through 5 above: wrong idea, wrong specification, wrong tests, wrong code, wrong test execution... The bottom line is, step 6 alone is not the most important step in the process.
Concrete example of wrong code that doesn't have any bugs and has 100% coverage:
/**
* Returns the duration in milliseconds
*/
int getDuration() {
return end - start;
}
// test:
start = 0;
end = 1;
assertEquals(1, getDuration()); // yay!
// but the requirement was:
// The interface should have a method for returning the duration in *seconds*.
Almost anything.
Have you read Code Complete? (Because StackOverflow says you really should.) In Chapter 22 it says "100% statement coverage is a good start, but it is hardly sufficient". The rest of the chapter explains how to determine which additional tests to add. Here's a brief taster.
Structured basis testing and data flow testing means testing each logic path through the program. There are four paths through the contrived code below, depending on the values of A and B. 100% statement coverage could be achieved by testing only two of the four paths, perhaps f=1:g=1/f and f=0:g=f+1. But f=0:g=1/f will give a divide by zero error. You have to consider if statements and while and for loops (the loop body might never be executed) and every branch of a select or switch statement.
If A Then
f = 1
Else
f = 0
End If
If B Then
g = f + 1
Else
g = f / 0
End If
Error guessing - informed guesses about types of input that often cause errors. For instance boundary conditions (off by one errors), invalid data, very large values, very small values, zeros, nulls, empty collections.
And even so there can be errors in your requirements, errors in your tests, etc - as others have mentioned.