I'm confused about the number of test cases used for a boolean function. Say I'm writing a function to check whether the sale price of something is over $60 dollars.
function checkSalePrice(price) {
return (price > 60)
}
In my Advance Placement course, they ask the minimum # of test include boundary values. So in this case, the an example set of tests are [30, 60, 90]. This course I'm taking says to only test two values, lower and higher, eg (30, 90)
Which is correct? (I know this is pondering the depth of a cup of water, but I'd like to get a few more samples as I'm new to programming)
Kent Beck wrote
I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence (I suspect this level of confidence is high compared to industry standards, but that could just be hubris). If I don't typically make a kind of mistake (like setting the wrong variables in a constructor), I don't test for it. I do tend to make sense of test errors, so I'm extra careful when I have logic with complicated conditionals. When coding on a team, I modify my strategy to carefully test code that we, collectively, tend to get wrong.
Me? I make fence post errors. So I would absolutely want to be sure that my test suite would catch the following incorrect implementation of checkSalePrice
function checkSalePrice(price) {
return (price >= 60)
}
If I were writing checkSalePrice using test-driven-development, then I would want to calibrate my tests by ensuring that they fail before I make them pass. Since in my programming environment a trivial boolean function returns false, my flow would look like
assert checkSalePrice(61)
This would fail, because the method by default returns false. Then I would implement
function checkSalePrice(price) {
return true
}
Now my first check passes, so I know what this boundary case is correctly covered. I would then add a new check
assert ! checkSalePrice(60)
which would fail. Providing the corrected implementation would pass the check, and now I can confidently refactor the method as necessary.
Adding a third check here for an arbitrary value isn't going to provide additional safety when changing the code, nor is it going to make the life of the next maintainer any easier, so I would settle for two cases here.
Note that the heuristic I'm using is not related to the complexity of the returned value, but the complexity of the method
Complexity of the predicate might include covering various problems reading the input. For instance, if we were passing a collection, what cases do we want to make sure are covered? J. B. Rainsberger suggested the following mnemonic
zero
one
many
lots
oops
Bruce Dawson points out that there are only 4 billion floats, so maybe you should [test them all].
Do note, though, that those extra 4 billion minus two checks aren't adding a lot of design value, so we've probably crossed from TDD into a different realm.
You stumbled into on of the big problems with testing in general - how many tests are good enough?!
There are basically three ways to look at this:
black box testing: you do not care about the internals of your MuT (method under test). You only focus on the contract of the method. In your case: should return return true when price > 60. When you think about this for while, you would find tests 30 and 90 ... and maybe 60 as well. It is always good practice to test corner cases. So the answer would be: 3
white box testing: you do coverage measurements of your tests - and you strive for example to hit all paths at least once. In this case, you could go with 30 and 90 - which would be resulting in 100% coverage: So the answer here: 2
randomized testing, as guided by QuickCheck. This approach is very much different: you don't specify test cases at all. Instead you step back and identify rules that should hold true about your MuT. And then the framework creates random input and invokes your MuT using that - trying to find examples where the aforementioned rules break.
In your case, such a rule could be that: when checkSalePrice(a) and checkSalePrice(b) then checkSalePrice(a+b). This approach feels unusual first, but as soon as start exploring its possibilities, you can find very interesting things in it. Especially when you understand that your code can provide the required "creator" functions to the framework. That allows you to use this approach to even test much more complicated, "object oriented" stuff. It is just great to watch the framework find a flaw - and to then realize that the framework will even find the "minimum" example data required to break a rule that you specified.
I've seen this pseudo-random number generator for use in shaders referred to here and there around the web:
float rand(vec2 co){
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}
It's variously called "canonical", or "a one-liner I found on the web somewhere".
What's the origin of this function? Are the constant values as arbitrary as they seem or is there some art to their selection? Is there any discussion of the merits of this function?
EDIT: The oldest reference to this function that I've come across is this archive from Feb '08, the original page now being gone from the web. But there's no more discussion of it there than anywhere else.
Very interesting question!
I am trying to figure this out while typing the answer :)
First an easy way to play with it: http://www.wolframalpha.com/input/?i=plot%28+mod%28+sin%28x*12.9898+%2B+y*78.233%29+*+43758.5453%2C1%29x%3D0..2%2C+y%3D0..2%29
Then let's think about what we are trying to do here: For two input coordinates x,y we return a "random number". Now this is not a random number though. It's the same every time we input the same x,y. It's a hash function!
The first thing the function does is to go from 2d to 1d. That is not interesting in itself, but the numbers are chosen so they do not repeat typically. Also we have a floating point addition there. There will be a few more bits from y or x, but the numbers might just be chosen right so it does a mix.
Then we sample a black box sin() function. This will depend a lot on the implementation!
Lastly it amplifies the error in the sin() implementation by multiplying and taking the fraction.
I don't think this is a good hash function in the general case. The sin() is a black box, on the GPU, numerically. It should be possible to construct a much better one by taking almost any hash function and converting it. The hard part is to turn the typical integer operation used in cpu hashing into float (half or 32bit) or fixed point operations, but it should be possible.
Again, the real problem with this as a hash function is that sin() is a black box.
The origin is probably the paper: "On generating random numbers, with help of y= [(a+x)sin(bx)] mod 1", W.J.J. Rey, 22nd European Meeting of Statisticians and the 7th Vilnius Conference on Probability Theory and Mathematical Statistics, August 1998
EDIT: Since I can't find a copy of this paper and the "TestU01" reference may not be clear, here's the scheme as described in TestU01 in pseudo-C:
#define A1 ???
#define A2 ???
#define B1 pi*(sqrt(5.0)-1)/2
#define B2 ???
uint32_t n; // position in the stream
double next() {
double t = fract(A1 * sin(B1*n));
double u = fract((A2+t) * sin(B2*t));
n++;
return u;
}
where the only recommended constant value is the B1.
Notice that this is for a stream. Converting to a 1D hash 'n' becomes the integer grid. So my guess is that someone saw this and converted 't' into a simple function f(x,y). Using the original constants above that would yield:
float hash(vec2 co){
float t = 12.9898*co.x + 78.233*co.y;
return fract((A2+t) * sin(t)); // any B2 is folded into 't' computation
}
the constant values are arbitrary, especially that they are very large, and a couple of decimals away from prime numbers.
a modulus over 1 of a hi amplitude sinus multiplied by 4000 is a periodic function. it's like a window blind or a corrugated metal made very small because it's multiplied by 4000, and turned at an angle by the dot product.
as the function is 2-D, the dot product has the effect of turning the periodic function at an oblique relative to X and Y axis. By 13/79 ratio approximately. It is inefficient, you can actually achieve the same by doing sinus of (13x + 79y) this will also achieve the same thing I think with less maths..
If you find the period of the function in both X and Y, you can sample it so that it will look like a simple sine wave again.
Here is a picture of it zoomed in graph
I don't know the origin but it is similar to many others, if you used it in graphics at regular intervals it would tend to produce moire patterns and you could see it's eventually goes around again.
Maybe it's some non-recurrent chaotic mapping, then it could explain many things, but also can be just some arbitrary manipulation with large numbers.
EDIT: Basically, the function fract(sin(x) * 43758.5453) is a simple hash-like function, the sin(x) provides smooth sin interpolation between -1 to 1, so sin(x) * 43758.5453 will be interpolation from -43758.5453 to 43758.5453. This is a quite huge range, so even small step in x will provide large step in result and really large variation in fractional part. The "fract" is needed to get values in range -0.99... to 0.999... .
Now, when we have something like hash function we should create function for production hash from the vector. The simplest way is call "hash" separetly for x any y component of the input vector. But then, we will have some symmetrical values. So, we should get some value from the vector, the approach is find some random vector and find "dot" product to that vector, here we go: fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
Also, according to the selected vector, its lenght should be long engough to have several peroids of the "sin" function after "dot" product will be computed.
I do not believe this to be the true origin, but OP's code is presented as code example in "The Book of Shaders" by Patricio Gonzalez Vivo and Jen Lowe ( https://thebookofshaders.com/10/ ). In their code, Patricio Gonzales Vivo is cited as the author, i.e "// Author #patriciogv - 2015"
Since the OP's research dates back even further (to '08), the source might at least explain its popularity, and the author might be able to shed some light on his source.
Say I have three methods, all very similar but with different input types:
void printLargestNumber(int a, int b) { ... }
void printLargestNumber(double a, double b) { ... }
void printLargestNumber(String numberAsString, String numberAsString) { ... }
All three use the same underlying logic. For example: maybe the double version is the only one that compares numbers, and the other two just convert their inputs to double.
We could imagine a few different unit tests: first input is larger, second is larger, both inputs are negative, etc.
My Question
Should all three methods have the full set of tests (black box since we don't assume the core implementation is the same)
or
Should only the double version be tested heavily and the other two tested lightly to verify parameter conversion (white box testing since we know they share the same implementation and it's already been tested in the double tests)?
If all of those methods are public, i.e. callable by the outside world, I'd definitely test all of them with a full set of tests. One good reason is that white-box tests are more brittle than black-box tests; if the implementation changes the public contract might change for some of those methods.
There are a set of tests that explicitly exercise the public interfaces. I would treat those as black-box tests.
There are a second set of tests that could be seen as looking at the corner cases of the implementation. This is white box testing and surely has a place in a Unit test. You can't know the interesting paths without some white-box implementation knowledge. I would pay particular attention to the String case, because the interface allows for strings that may not convert cleanly to doubles, that push the boundaries of precision etc.
Would I cut a few corners in the integer case? I know I pushed the paths in the double case, probably shouldn't but might well under time pressure.
It depends.
Do you think the implementation is likely to change? If so then go with black box testing.
If you can guarantee that the implementation won't change go with white box. However, the chances of you being able to guarantee this aren't 100%.
You could compromise and do some of the black box tests, particularly around the boundary conditions. However, writing the tests should be easy - so there's no excuse from that point of view for not doing full black box testing. The only limiting factor is the time it takes to run the tests.
Perhaps you should investigate the possibility of running the tests in parallel.
For my ColorJizz library, I'm expecting slight errors when you do multiple conversions between different color formats. The errors are all very small (i.e. 0.0001 out).
What do you think I should do about these?
I feel like there are 2 real options:
Leave them as they are, with almost 30% of tests failing
Put some kind of 'error range' in my unit tests and pass them if they're within that range. But how do I judge what level or error I should have?
Here's an example of the kind of failures I'm getting:
http://www.mikeefranklin.co.uk/tests/test/
What would be the best solution?
It seems you are using floating point values, for which rounding errors are a fact of life. I recommend applying an error margin for comparison checks in your unit tests.
Leaving even some of your unit tests failing is not a realistic option - unit tests should pass 100% under normal circumstances. If you let some of them fail regularly, you won't easily notice when there is a new failure, signifying a real bug in your code.
The error range is the standard approach for floating point "equality" tests.
NUnit uses "within":
Assert.That( 2.1 + 1.2, Is.EqualTo( 3.3 ).Within( .0005 );
Ruby's test/unit uses assert_in_delta:
assert_in_delta 0.05, (50000.0 / 10**6), 0.00001
And most other test frameworks have something similar. Apparently qunit is one that does not have something similar, but it would be easy enough to modify the source to include one of your design.
As for the actual delta to use, it depends on your application. I would think that 0.01 would be actually pretty restrictive for humans to visually identify color differences, but it would be a fairly lax requirement mathematically.
Gallio/MbUnit has a dedicated assertion for that specific test case (Assert.AreApproximatelyEqual)
double a = 5;
double b = 4.999;
Assert.AreApproximatelyEqual(a, b, 0.001); // Pass!
What type of errors could my code still contain even if I have 100% code coverage? I'm looking for concrete examples or links to concrete examples of such errors.
Having 100% code coverage is not that great as one may think of it. Consider a trival example:
double Foo(double a, double b)
{
return a / b;
}
Even a single unit test will raise code coverage of this method to 100%, but the said unit test will not tell us what code is working and what code is not. This might be a perfectly valid code, but without testing edge conditions (such as when b is 0.0) unit test is inconclusive at best.
Code coverage only tells us what was executed by our unit tests, not whether it was executed correctly. This is an important distinction to make. Just because a line of code is executed by a unit test, does not necessarily mean that that line of code is working as intended.
Listen to this for an interesting discussion.
Code coverage doesn't mean that your code is bug free in any way. It's an estimate on how well you're test cases cover your source code base. 100% code coverage would imply that every line of code is tested but every state of your program certainly is not. There's research being done in this area, I think it's referred to as finite state modeling but it's really a brute force way of trying to explore every state of a program.
A more elegant way of doing the same thing is something referred to as abstract interpretation. MSR (Microsoft Research) have released something called CodeContracts based on abstract interpretation. Check out Pex as well, they really emphasis cleaver methods of testing application run-time behavior.
I could write a really good test which would give me good coverage, but there's no guarantees that that test will explore all the states that my program might have. This is the problem of writing really good tests, which is hard.
Code coverage does not imply good tests
Uh? Any kind of ordinary logic bug, I guess? Memory corruption, buffer overrun, plain old wrong code, assignment-instead-of-test, the list goes on. Coverage is only that, it lets you know that all code paths are executed, not that they are correct.
As I haven't seen it mentioned yet, I'd like to add this thread that code coverage does not tell you what part of your code is bugfree.
It only tells you what parts of your code is guaranteed to be untested.
1. "Data space" problems
Your (bad) code:
void f(int n, int increment)
{
while(n < 500)
{
cout << n;
n += increment;
}
}
Your test:
f(200,100);
Bug in real world use:
f(200,0);
My point: Your test may cover 100% of the lines of your code but it will not (typically) cover all your possible input data space, i.e. the set of all possible values of inputs.
2. Testing against your own mistake
Another classical example is when you just take a bad decision in design, and test your code against your own bad decision.
E.g. The specs document says "print all prime numbers up to n" and you print all prime numbers up to n but excluding n. And your unit tests test your wrong idea.
3. Undefined behaviour
Use the value of uninitialized variables, cause an invalid memory access, etc. and your code has undefined behaviour (in C++ or any other language that contemplates "undefined behaviour"). Sometimes it will pass your tests, but it will crash in the real world.
...
There can always be runtime exceptions: memory filling up, database or other connections not being closed etc...
Consider the following code:
int add(int a, int b)
{
return a + b;
}
This code could fail to implement some necessary functionality (i.e. not meet an end-user requirements): "100% coverage" doesn't necessarily test/detect functionality which ought to be implemented but which isn't.
This code could work for some but not all input data ranges (e.g. when a and b are both very large).
Code coverage doesn't mean anything, if your tests contain bugs, or you are testing the wrong thing.
As a related tangent; I'd like to remind to you that I can trivially construct an O(1) method that satisfies the following pseudo-code test:
sorted = sort(2,1,6,4,3,1,6,2);
for element in sorted {
if (is_defined(previousElement)) {
assert(element >= previousElement);
}
previousElement = element;
}
bonus karma to Jon Skeet, who pointed out the loophole I was thinking about
Code coverage usually only tells you how many of the branches within a function are covered. It doesn't usually report the various paths that could be taken between function calls. Many errors in programs happen because the handoff from one method to another is wrong, not because the methods themselves contain errors. All bugs of this form could still exist in 100% code coverage.
In a recent IEEE Software paper "Two Mistakes and Error-Free Software: A Confession", Robert Glass argued that in the "real world" there are more bugs caused by what he calls missing logic or combinatorics (which can't be guarded against with code coverage tools) than by logic errors (which can).
In other words, even with 100% code coverage you still run the risk of encountering these kinds of errors. And the best thing you can do is--you guessed it--do more code reviews.
The reference to the paper is here and I found a rough summary here.
works on my machine
Many things work well on local machine and we cannot assure that to work on Staging/Production. Code Coverage may not cover this.
Errors in tests :)
Well if your tests don't test the thing which happens in the code covered.
If you have this method which adds a number to properties for example:
public void AddTo(int i)
{
NumberA += i;
NumberB -= i;
}
If your test only checks the NumberA property, but not NumberB, then you will have 100% coverage, the test passes, but NumberB will still contain an error.
Conclusion: a unit test with 100% will not guarantee that the code is bug-free.
Argument validation, aka. Null Checks. If you take any external inputs and pass them into functions but never check if they are valid/null, then you can achieve 100% coverage, but you will still get a NullReferenceException if you somehow pass null into the function because that's what your database gives you.
also, arithmetic overflow, like
int result = int.MAXVALUE + int.MAXVALUE;
Code Coverage only covers existing code, it will not be able to point out where you should add more code.
I don't know about anyone else, but we don't get anywhere near 100% coverage. None of our "This should never happen" CATCHes get exercised in our tests (well, sometimes they do, but then the code gets fixed so they don't any more!). I'm afraid I don't worry that there might be a Syntax/Logic error in a never-happen-CATCH
Your product might be technically correct, but not fulfil the needs of the customer.
FYI, Microsoft Pex attempts to help out by exploring your code and finding "edge" cases, like divide by zero, overflow, etc.
This tool is part of VS2010, though you can install a tech preview version in VS2008. It's pretty remarkable that the tool finds the stuff it finds, though, IME, it's still not going to get you all the way to "bulletproof".
Code Coverage doesn't mean much. What matters is whether all (or most) of the argument values that affect the behavior are covered.
For eg consider a typical compareTo method (in java, but applies in most languages):
//Return Negative, 0 or positive depending on x is <, = or > y
int compareTo(int x, int y) {
return x-y;
}
As long as you have a test for compareTo(0,0), you get code coverage. However, you need at least 3 testcases here (for the 3 outcomes). Still it is not bug free. It also pays to add tests to cover exceptional/error conditions. In the above case, If you try compareTo(10, Integer.MAX_INT), it is going to fail.
Bottomline: Try to partition your input to disjoint sets based on behavior, have a test for at least one input from each set. This will add more coverage in true sense.
Also check for tools like QuickCheck (If available for your language).
Perform a 100% code coverage, i.e., 100% instructions, 100% input and output domains, 100% paths, 100% whatever you think of, and you still may have bugs in your code: missing features.
As mentioned in many of the answers here, you could have 100% code coverage and still have bugs.
On top of that, you can have 0 bugs but the logic in your code may be incorrect (not matching the requirements). Code coverage, or being 100% bug-free can't help you with that at all.
A typical corporate software development practice could be as follows:
Have a clearly written functional specification
Have a test plan that's written against (1) and have it peer reviewed
Have test cases written against (2) and have them peer reviewed
Write code against the functional specification and have it peer reviewed
Test your code against the test cases
Do code coverage analysis and write more test cases to achieve good coverage.
Note that I said "good" and not "100%". 100% coverage may not always be feasible to achieve -- in which case your energy is best spent on achieving correctness of code, rather than the coverage of some obscure branches. Different sorts of things can go wrong in any of the steps 1 through 5 above: wrong idea, wrong specification, wrong tests, wrong code, wrong test execution... The bottom line is, step 6 alone is not the most important step in the process.
Concrete example of wrong code that doesn't have any bugs and has 100% coverage:
/**
* Returns the duration in milliseconds
*/
int getDuration() {
return end - start;
}
// test:
start = 0;
end = 1;
assertEquals(1, getDuration()); // yay!
// but the requirement was:
// The interface should have a method for returning the duration in *seconds*.
Almost anything.
Have you read Code Complete? (Because StackOverflow says you really should.) In Chapter 22 it says "100% statement coverage is a good start, but it is hardly sufficient". The rest of the chapter explains how to determine which additional tests to add. Here's a brief taster.
Structured basis testing and data flow testing means testing each logic path through the program. There are four paths through the contrived code below, depending on the values of A and B. 100% statement coverage could be achieved by testing only two of the four paths, perhaps f=1:g=1/f and f=0:g=f+1. But f=0:g=1/f will give a divide by zero error. You have to consider if statements and while and for loops (the loop body might never be executed) and every branch of a select or switch statement.
If A Then
f = 1
Else
f = 0
End If
If B Then
g = f + 1
Else
g = f / 0
End If
Error guessing - informed guesses about types of input that often cause errors. For instance boundary conditions (off by one errors), invalid data, very large values, very small values, zeros, nulls, empty collections.
And even so there can be errors in your requirements, errors in your tests, etc - as others have mentioned.