How can I safely unit test a function without testing every possible 64-bit input value? - unit-testing

I have a function which checks the parity of a 64-bit word. Sadly the input value really could be anything so I cannot bias my test to cover a known sub-set of values, and I clearly cannot test every single possible 64-bit value...
I considered using random numbers so that each time the test was run, the function gained more coverage however unit tests should be consistent.
Ignoring my specific application, is there a sensible way to ensure a reasonable level of coverage, which is highly likely to expose errors introduced in the future, whilst not taking the best part of a billion years to run?

The following argumentation assumes that you have written / have access to the source code and do white box testing.
Depending on the level of confidence you need, you might consider proving the algorithm correct, possibly using automated provers. But, under the assumption that your code is not part of an application which demands this level of confidence, you probably can gain sufficient confidence with a comparably small set of unit-tests.
Lets assume that your algorithm somehow loops over the 64 bits (or, is intended to do so, because you still need to test it). This means, that the 64 bits are to be handled in a very regular way. Now, there could be a bug in your code such that, in the body of the loop, instead of using the respective bit from the 64 bit input, always a value of 0 is used by mistake. This bug would mean that you always get a parity of 0 as result. This particular bug can be found by any input value that leads to an expected parity of 1.
From this example we can conclude that for every bug that could realistically occur, you need one corresponding test case that can find that bug. Therefore, if you look at your algorithm and think about which bugs might be present, you may come up with, say, x bugs. Then you will need not more than x test cases to find these bugs. (Some of your test cases will likely find more than one of the bugs.)
This principal consideration has lead to a number of strategies to derive test cases, like equivalence partitioning or boundary testing. With boundary testing, for example, you would put special focus on the bits 0 and 63, which are at the boundaries of the loop's indices. This way you catch many of the classical off-by-one errors.
Now, what about the situation that an algorithm changes in the future (as you have asked about errors introduced in the future)? Instead of looping over the 64 bits, the parity can be calculated with xor-ing in various ways. For example, to improve the speed you might first xor the upper 32 bits with the lower 32 bits, then take the result and xor the upper 16 bits with the lower 16 bits and so on.
This alternative algorithm will have a different set of possible bugs. To be future proof with your test cases, you may also have to consider such alternative algorithms and the corresponding bugs. Most likely, however, the test cases for the first algorithm will find a large portion of those bugs as well - so probably the amount of tests will not increase too much. The analysis, however, becomes more complex.
In practice, I would focus on the currently chosen algorithm and rather take the approach to re-design the test suite in the case that the algorithm is changed fundamentally.
Sorry if this answer is too generic. But, as should have become clear, a more concrete answer would require more details about the algorithm that you have chosen.

Related

Recommendable test for software integer multiplication?

I've coded a number of integer multiplication routines for Atmel's AVR architecture. I found following a simple pattern for the multiplier (and a similar one for the multiplicand) useful, if unconvincing (start at zero, step by a one in every byte (in addition to eventual carries)).
There seems to be quite a bit about testing hardware multiplier implementations, but:
What can be recommended for testing software implementations of integer multiplication? Exhaustive testing gets out of hand - if not at, then beyond 16×16 bit.
Most approaches uses Genere & Test
generate test operands
The safest is to use all combinations of operands. But with bignums or small computing power or memory is this not possible or practical. In that case are usually used some test cases which will (most likely) stress the algorithm tested (like propagating carry ... or be near safe limits) and use only them. Another option is to use pseudo random operands and hope for a valid test sample. or combine all these in some way together
compute multiplication
Just apply the tested algorithm on generated input data.
assess the results
so you need to compare the algorithm(multiplication) result to something. Mostly use different algorithm of the same process to compare with. Or use inverse functions (like c=a*b; if (a!=c/b) ...). The problem with this is that you can not distinguish between error in compared or compared to algorithm ... unless you use something 100% working or compare to more then just one operation... There is also the possibility to have precomputed result table (computed on different platform 100% bug free) and compare to that.
AVR Tag make this a tough question. So some hints:
many MCU's have a lot of Flash memory for program that can be used to store precomputed values to compare to
sometimes running in emulator is faster/more comfortable but with the risk of dismissing some HW related issues.

About optimized math functions, ranges and intervals

I'm trying to wrap my head around how people that code math functions for games and rendering engines can use an optimized math function in an efficient way; let me explain that further.
There is an high need for fast trigonometric functions in those fields, at times you can optimize a sin, a cos or other functions by rewriting them in a different form that is valid only for a given interval, often times this means that your approximation of f(x) is just for the first quadrant, meaning 0 <= x <= pi/2 .
Now the input for your f(x) is still about all 4 quadrants, but the real formula only covers 1/4 of that interval, the straightforward solution is to detect the quadrant by analyzing the input and see in which range it belongs to, then you adjust the result of the formula accordingly if the input comes from a quadrant that is not the first quadrant .
This is good in theory but this also presents a couple of really bad problems, especially considering the fact that you are doing all this to steal a couple of cycles from your CPU ( you also get a consistent implementation, that is not platform dependent like an hardcoded fsin in Intel x86 asm that only works on x86 and has a certain error range, all of this may differ on other platforms with other asm instructions ), so you should keep things working at a concurrent and high performance level .
The reason I can't wrap my head around the "switch case" with quadrants solution is:
it just prevents possible optimizations, namely memoization, considering that you usually want to put that switch-case inside the same functions that actually computes the f(x), probably the situation can be improved by implementing the formula for f(x) outside said function, but this is will lead to a doubling in the number of functions to maintain for any given math library
increase probability of more branching with a concurrent execution
generally speaking doesn't lead to better, clean, dry code, and conditional statements are often times a potential source of bugs, I don't really like switch-cases and similar things .
Assuming that I can implement my cross-platform f(x) in C or C++, how the programmers in this field usually address the problem of translating and mapping the inputs, the quadrants to the result via the actual implementation ?
Note: In the below answer I am speaking very generally about code.
Assuming that I can implement my cross-platform f(x) in C or C++, how the programmers in this field usually address the problem of translating and mapping the inputs, the quadrants to the result via the actual implementation ?
The general answer to this is: In the most obvious and simplest way possible that achieves your purpose.
I'm not entirely sure I follow most of your arguments/questions but I have a feeling you are looking for problems where really none exist. Do you truly have the need to re-implement the trigonometric functions? Don't fall into the trap of NIH (Not Invented Here).
the straightforward solution is to detect the quadrant
Yes! I love straightforward code. Code that is perfectly obvious at a glance what it does. Now, sometimes, just sometimes, you have to do some crazy things to get it to do what you want: for performance, or avoiding bugs out of your control. But the first version should be most obvious and simple code that solves your problem. From there you do testing, profiling, and benchmarking and if (only if) you find performance or other issues, then you go into the crazy stuff.
This is good in theory but this also presents a couple of really bad problems,
I would say that this is good in theory and in practice for most cases and I definitely don't see any "bad" problems. Minor issues in specific corner cases or design requirements at most.
A few things on a few of your specific comments:
approximation of f(x) is just for the first quadrant: Yes, and there are multiple reasons for this. One simply is that most trigonometric functions have identities so you can easily use these to reduce range of input parameters. This is important as many numerical techniques only work over a specific range of inputs, or are more accurate/performant for small inputs. Next, for very large inputs you'll have to trim the range anyways for most numerical techniques to work or at least work in a decent amount of time and have sufficient accuracy. For example, look at the Taylor expansion for cos() and see how long it takes to converge sufficiently for large vs small inputs.
it just prevents possible optimizations: Chances are your c++ compiler these days is way better at optimizations than you are. Sometimes it isn't but the general procedure is to let the compiler do its optimization and only do manual optimizations where you have measured and proven that you need it. Theses days it is very non-intuitive to tell what code is faster by just looking at it (you can read all the questions on SO about performance issues and how crazy some of the root causes are).
namely memoization: I've never seen memoization in place for a double function. Just think how many doubles are there between 0 and 1. Now in reduced accuracy situations you can take advantage of it but this is easily implemented as a custom function tailored for that exact situation. Thinking about it, I'm not exactly sure how to implement memoization for a double function that actually means anything and doesn't loose accuracy or performance in the process.
increase probability of more branching with a concurrent execution: I'm not sure I'd implement trigonometric functions in a concurrent manner but I suppose its entirely possible to get some performance benefits. But again, the compiler is generally better at optimizations than you so let it do its jobs and then benchmark/profile to see if you really need to do better.
doesn't lead to better, clean, dry code: I'm not sure what exactly you mean here, or what "dry code" is for that matter. Yes, sometimes you can get into trouble by too many or too complex if/switch blocks but I can't see a simple check for 4 quadrants apply here...it's a pretty basic and simple case.
So for any platform I get the same y for the same values of x: My guess is that getting "exact" values for all 53 bits of double across multiple platforms and systems is not going to be possible. What is the result if you only have 52 bits correct? This would be a great area to do some tests in and see what you get.
I've used trigonometric functions in C for over 20 years and 99% of the time I just use whatever built-in function is supplied. In the rare case I need more performance (or accuracy) as proven by testing or benchmarking, only then do I actually roll my own custom implementation for that specific case. I don't rewrite the entire gamut of <math.h> functions in the hope that one day I might need them.
I would suggest try coding a few of these functions in as many ways as you can find and do some accuracy and benchmark tests. This will give you some practical knowledge and give you some hard data on whether you actually need to reimplement these functions or not. At the very least this should give you some practical experience with implementing these types of functions and chances are answer a lot of your questions in the process.

What's a good technique to unit test digial audio generation

I want to unit test a signal generator - let's say it generates a simple sine wave, or does frequency modulation of a signal onto a sine wave. It's easy enough to define sensible test parameters, and it's well known what the output should "look like" - but this is quite hard to test.
I could do (eg) a frequency analysis on the output and check that, check the maximum amplitude etc, but a) this will make the test code significantly more complicated than the code it's testing and b) doesn't fully test the shape of the output.
Is there an established way to do this?
One way to do this would be to capture a "known good" output and compare bit-for-bit against that. As long as your algorithm is deterministic you should get the same output every time. You might have to recalibrate it occasionally if anything changes, but at least you'll know if it does change at all.
This situation is a strong argument for a modeling tool like Matlab, to generate and review a well understood test set automatically, as well as to provide an environment for automatic comparison and scoring. Especially for instances where combinatorial explosions of test variations take place, automation makes it possible and straight forward generate a huge dataset, locate problems, and pare back if needed to a representative qualification test set.
Often undervalued is the means to generate a large, extensive tests exercising both the requirements and the limits of the implementation of your design. Thinking about and designing those cases up front is also a huge advantage in introducing a clean, problem free system.
One possible semi-automated way of testing is to code up your signal generators from spec by 3 different algorithms, or perhaps by 3 different programmers in 3 different programming languages. Then randomly generate parameters within the complete range of legal control input values and capture and compare the outputs of all 3 generators to see if they agree within some error bound. You could also include some typical and some suspected worse case parameters. If the outputs always agree, then there's a much higher probability that everything works per spec than if they don't.

How to test scientific software?

I'm convinced that software testing indeed is very important, especially in science. However, over the last 6 years, I never have come across any scientific software project which was under regular tests (and most of them were not even version controlled).
Now I'm wondering how you deal with software tests for scientific codes (numerical computations).
From my point of view, standard unit tests often miss the point, since there is no exact result, so using assert(a == b) might prove a bit difficult due to "normal" numerical errors.
So I'm looking forward to reading your thoughts about this.
I am also in academia and I have written quantum mechanical simulation programs to be executed on our cluster. I made the same observation regarding testing or even version control. I was even worse: in my case I am using a C++ library for my simulations and the code I got from others was pure spaghetti code, no inheritance, not even functions.
I rewrote it and I also implemented some unit testing. You are correct that you have to deal with the numerical precision, which can be different depending on the architecture you are running on. Nevertheless, unit testing is possible, as long as you are taking these numerical rounding errors into account. Your result should not depend on the rounding of the numerical values, otherwise you would have a different problem with the robustness of your algorithm.
So, to conclude, I use unit testing for my scientific programs, and it really makes one more confident about the results, especially with regards to publishing the data in the end.
Just been looking at a similar issue (google: "testing scientific software") and came up with a few papers that may be of interest. These cover both the mundane coding errors and the bigger issues of knowing if the result is even right (depth of the Earth's mantle?)
http://http.icsi.berkeley.edu/ftp/pub/speech/papers/wikipapers/cox_harris_testing_numerical_software.pdf
http://www.cs.ua.edu/~SECSE09/Presentations/09_Hook.pdf (broken link; new link is http://www.se4science.org/workshops/secse09/Presentations/09_Hook.pdf)
http://www.associationforsoftwaretesting.org/?dl_name=DianeKellyRebeccaSanders_TheChallengeOfTestingScientificSoftware_paper.pdf
I thought the idea of mutation testing described in 09_Hook.pdf (see also matmute.sourceforge.net) is particularly interesting as it mimics the simple mistakes we all make. The hardest part is to learn to use statistical analysis for confidence levels, rather than single pass code reviews (man or machine).
The problem is not new. I'm sure I have an original copy of "How accurate is scientific software?" by Hatton et al Oct 1994, that even then showed how different implementations of the same theories (as algorithms) diverged rather rapidly (It's also ref 8 in Kelly & Sanders paper)
--- (Oct 2019)
More recently Testing Scientific Software: A Systematic Literature Review
I'm also using cpptest for its TEST_ASSERT_DELTA. I'm writing high-performance numerical programs in computational electromagnetics and I've been happily using it in my C++ programs.
I typically go about testing scientific code the same way as I do with any other kind of code, with only a few retouches, namely:
I always test my numerical codes for cases that make no physical sense and make sure the computation actually stops before producing a result. I learned this the hard way: I had a function that was computing some frequency responses, then supplied a matrix built with them to another function as arguments which eventually gave its answer a single vector. The matrix could have been any size depending on how many terminals the signal was applied to, but my function was not checking if the matrix size was consistent with the number of terminals (2 terminals should have meant a 2 x 2 x n matrix); however, the code itself was wrapped so as not to depend on that, it didn't care what size the matrices were since it just had to do some basic matrix operations on them. Eventually, the results were perfectly plausible, well within the expected range and, in fact, partially correct -- only half of the solution vector was garbled. It took me a while to figure. If your data looks correct, it's assembled in a valid data structure and the numerical values are good (e.g. no NaNs or negative number of particles) but it doesn't make physical sense, the function has to fail gracefully.
I always test the I/O routines even if they are just reading a bunch of comma-separated numbers from a test file. When you're writing code that does twisted math, it's always tempting to jump into debugging the part of the code that is so math-heavy that you need a caffeine jolt just to understand the symbols. Days later, you realize you are also adding the ASCII value of \n to your list of points.
When testing for a mathematical relation, I always test it "by the book", and I also learned this by example. I've seen code that was supposed to compare two vectors but only checked for equality of elements and did not check for equality of length.
Please take a look at the answers to the SO question How to use TDD correctly to implement a numerical method?

How should I Test a Genetic Algorithm

I have made a quite few genetic algorithms; they work (they find a reasonable solution quickly). But I have now discovered TDD. Is there a way to write a genetic algorithm (which relies heavily on random numbers) in a TDD way?
To pose the question more generally, How do you test a non-deterministic method/function. Here is what I have thought of:
Use a specific seed. Which wont help if I make a mistake in the code in the first place but will help finding bugs when refactoring.
Use a known list of numbers. Similar to the above but I could follow the code through by hand (which would be very tedious).
Use a constant number. At least I know what to expect. It would be good to ensure that a dice always reads 6 when RandomFloat(0,1) always returns 1.
Try to move as much of the non-deterministic code out of the GA as possible. which seems silly as that is the core of it's purpose.
Links to very good books on testing would be appreciated too.
Seems to me that the only way to test its consistent logic is to apply consistent input, ... or treat each iteration as a single automaton whose state is tested before and after that iteration, turning the overall nondeterministic system into testable components based on deterministic iteration values.
For variations/breeding/attribute inheritance in iterations, test those values on the boundaries of each iteration and test the global output of all iterations based on known input/output from successful iteration-subtests ...
Because the algorithm is iterative you can use induction in your testing to ensure it works for 1 iteration, n+1 iterations to prove it will produce correct results (regardless of data determinism) for a given input range/domain and the constraints on possible values in the input.
Edit I found this strategies for testing nondeterministic systems which might provide some insight. It might be helpful for statistical analysis of live results once the TDD/development process proves the logic is sound.
I would test random functions by testing them a number of times and analyzing whether the distribution of return values meets the statistical expectations (this involves some statistical knowledge).
If you're talking TDD, I would say definitely start out by picking a constant number and growing your test suite from there. I've done TDD on a few highly mathematical problems and it helps to have a few constant cases you know and have worked out by hand to run with from the beginning.
W/R/T your 4th point, moving nondeterministic code out of the GA, I think this is probably an approach worth considering. If you can decompose the algorithm and separate the nondeterministic concerns, it should make testing the deterministic parts straightforward. As long as you're careful about how you name things I don't think that you're sacrificing much here. Unless I am misunderstanding you, the GA will still delegate to this code, but it lives somewhere else.
As far as links to very good books on (developer) testing my favorites are:
Test Driven by Lasse Kosela
Working Effectively with Legacy Code by Michael Feathers
XUnit Test Patterns by Gerard Meszaros
Next Generation Java™ Testing: TestNG and Advanced Concepts by Cédric Beust & Hani Suleiman
One way I do for unit testing of non-deterministic functions of GA algorithms is put the election of random numbers in a different function of the logic one that uses that random numbers.
For example, if you have a function that takes a gene (vector of something) and takes two random points of the gene to do something with them (mutation or whatever), you can put the generation of the random numbers in a function, and then pass them along with the gene to another function that contains the logic given that numbers.
This way you can do TDD with the logic function and pass it certain genes and certain numbers, knowing exactly what the logic should do on the gene given that numbers and being able to write asserts on the modified gene.
Another way, to test with the generation of random numbers is externalizing that generation to another class, that could be accessed via a context or loaded from a config value, and using a different one for test executions. There would be two implementations of that class, one for production that generates actual random numbers, and another for testing, that would have ways to accept the numbers that later it will generate. Then in the test you could provide that certain numbers that the class will supply to the tested code.
You could write a redundant neural network to analyze the results from your algorithm and have the output ranked based on expected outcomes. :)
Break your method down as much as your can. Then you can also have a unit test around just the random part to check the range of values. Even have the test run it a few times to see if the result changes.
All of your functions should be completely deterministic. This means that none of the functions you are testing should generate the random number inside the function itself. You will want to pass that in as a parameter. That way when your program is making decisions based on your random numbers, you can pass in representative numbers to test the expected output for that number. The only thing that shouldn't be deterministic is your actual random number generator, which you don't really need to worry too much about because you shouldn't be writing this yourself. You should be able to just assume it works as long as its an established library.
That's for your unit tests. For your integration tests, if you are doing that, you might look into mocking your random number generation, replacing it with an algorithm that will return known numbers from 0..n for every random number that you need to generate.
I wrote a C# TDD Genetic Algorithm didactic application:
http://code.google.com/p/evo-lisa-clone/
Let's take the simplest random result method in the application: PointGenetics.Create, which creates a random point, given the boundaries. For this method I used 5 tests, and none of them relies on a specific seed:
http://code.google.com/p/evo-lisa-clone/source/browse/trunk/EvoLisaClone/EvoLisaCloneTest/PointGeneticsTest.cs
The randomness test is simple: for a large boundary (many possibilities), two consecutive generated points should not be equal. The remaining tests check other constraints.
Well the most testable part is the fitness function - where all your logic will be. this can be in some cases quite complex (you might be running all sorts of simulations based on input parameters) so you wanna be sure all that stuff works with a whole lot of unit tests, and this work can follow whatever methodology.
With regards to testing the GA parameters (mutation rate, cross-over strategy, whatever) if you're implementing that stuff yourself you can certainly test it (you can again have unit tests around mutation logic etc.) but you won't be able to test the 'fine-tuning' of the GA.
In other words, you won't be able to test if GA actually performs other than by the goodness of the solutions found.
A test that the algorithm gives you the same result for the same input could help you but sometimes you will make changes that change the result picking behavior of the algorithm.
I would make the most effort to have a test that ensures that the algorithm gives you a correct result. If the algorithm gives you a correct result for a number of static seeds and random values the algorithm works or is not broken through the changes made.
Another chance in TDD is the possibility to evaluate the algorithm. If you can automatically check how good a result is you could add tests that show that a change hasn't lowered the qualities of your results or increased your calculating time unreasonable.
If you want to test your algorithm with many base seeds you maybe want to have to test suits one suit that runs a quick test for starting after every save to ensure that you haven't broken anything and one suit that runs for a longer time for a later evaluation
I would highly suggest looking into using mock objects for your unit test cases (http://en.wikipedia.org/wiki/Mock_object). You can use them to mock out objects that make random guesses in order to cause you to get expected results instead.