Help Unit Testing cascading calculations - unit-testing

I have a class that is used as part of a financial application. The class is bound to a UI form and accepts a handful of values that are then used to calculate financial data. The calculated values are presented as properties on the object. Several of these calculations use other calculations.
For example, CalculationA may return PropertyA + PropertyB. CalculationB returns PropertyC - CalculationA. (This is an extreme over-simplification).
I am trying to write some unit tests to make sure that these calculations are performed correctly and wondering what approach I should be taking.
My first approach was to manually recalculate the expected result in the test method. For example, when testing CalculationB, I populate the test object then set the expected result equal to PropertyC - PropertyA + PropertyB. But since the real object has 25 properties involved, this is quite cumbersome.
On option I thought of is to simply create the test object, populate it with values then write a test that verifies CalculationA equals PropertyA + PropertyB and another test that verifies CalculationB equals PropertyC - CalculationB. The latter assumes that CalculationB is correct, but does that really matter for the purpose of the unit test?
What guidance/suggestions can you make for setting up my tests so they are accurate, reliable and maintainable? What is the best way to ensure that the calculations are correct and that I didn't accidentally set CalculationB = PropertyB - CalculationA, for instance.

Your case sounds equivalent to a spreadsheet, and a spreadsheet is just unusual syntax for code of the form:
f1(f2(a, f3(b)), c);
Where f1-3 are the calculations, and a-c the input properties. The 'chaining' is the fact that the outputs of some functions are used as inputs to others.
This kind of functional calculation code is where unit testing really shines. Testing the assembly as a whole means that a change to the specification of f3 would change the test cases for f2 and f1 in some complicated but meaningless way. This could well result in someone cutting and pasting the calculation result back into the test as the expected result. Which kind of makes the whole exercise a little pointless.
So if a minimal set of test cases is something like:
f1(7, -2) => 23
f2(1, 2) => 7
f3(4) => 5
then you can implement each of those test cases by:
set all properties to fixed large numbers
set the input properties to the input for this case
check the output property
Because point one is shared between all tests, the effort to produce test cases for each calculation is proportional only to the complexity of that specific calculation, not the total number of properties.

Related

Number of test-cases for a boolean function

I'm confused about the number of test cases used for a boolean function. Say I'm writing a function to check whether the sale price of something is over $60 dollars.
function checkSalePrice(price) {
return (price > 60)
}
In my Advance Placement course, they ask the minimum # of test include boundary values. So in this case, the an example set of tests are [30, 60, 90]. This course I'm taking says to only test two values, lower and higher, eg (30, 90)
Which is correct? (I know this is pondering the depth of a cup of water, but I'd like to get a few more samples as I'm new to programming)
Kent Beck wrote
I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence (I suspect this level of confidence is high compared to industry standards, but that could just be hubris). If I don't typically make a kind of mistake (like setting the wrong variables in a constructor), I don't test for it. I do tend to make sense of test errors, so I'm extra careful when I have logic with complicated conditionals. When coding on a team, I modify my strategy to carefully test code that we, collectively, tend to get wrong.
Me? I make fence post errors. So I would absolutely want to be sure that my test suite would catch the following incorrect implementation of checkSalePrice
function checkSalePrice(price) {
return (price >= 60)
}
If I were writing checkSalePrice using test-driven-development, then I would want to calibrate my tests by ensuring that they fail before I make them pass. Since in my programming environment a trivial boolean function returns false, my flow would look like
assert checkSalePrice(61)
This would fail, because the method by default returns false. Then I would implement
function checkSalePrice(price) {
return true
}
Now my first check passes, so I know what this boundary case is correctly covered. I would then add a new check
assert ! checkSalePrice(60)
which would fail. Providing the corrected implementation would pass the check, and now I can confidently refactor the method as necessary.
Adding a third check here for an arbitrary value isn't going to provide additional safety when changing the code, nor is it going to make the life of the next maintainer any easier, so I would settle for two cases here.
Note that the heuristic I'm using is not related to the complexity of the returned value, but the complexity of the method
Complexity of the predicate might include covering various problems reading the input. For instance, if we were passing a collection, what cases do we want to make sure are covered? J. B. Rainsberger suggested the following mnemonic
zero
one
many
lots
oops
Bruce Dawson points out that there are only 4 billion floats, so maybe you should [test them all].
Do note, though, that those extra 4 billion minus two checks aren't adding a lot of design value, so we've probably crossed from TDD into a different realm.
You stumbled into on of the big problems with testing in general - how many tests are good enough?!
There are basically three ways to look at this:
black box testing: you do not care about the internals of your MuT (method under test). You only focus on the contract of the method. In your case: should return return true when price > 60. When you think about this for while, you would find tests 30 and 90 ... and maybe 60 as well. It is always good practice to test corner cases. So the answer would be: 3
white box testing: you do coverage measurements of your tests - and you strive for example to hit all paths at least once. In this case, you could go with 30 and 90 - which would be resulting in 100% coverage: So the answer here: 2
randomized testing, as guided by QuickCheck. This approach is very much different: you don't specify test cases at all. Instead you step back and identify rules that should hold true about your MuT. And then the framework creates random input and invokes your MuT using that - trying to find examples where the aforementioned rules break.
In your case, such a rule could be that: when checkSalePrice(a) and checkSalePrice(b) then checkSalePrice(a+b). This approach feels unusual first, but as soon as start exploring its possibilities, you can find very interesting things in it. Especially when you understand that your code can provide the required "creator" functions to the framework. That allows you to use this approach to even test much more complicated, "object oriented" stuff. It is just great to watch the framework find a flaw - and to then realize that the framework will even find the "minimum" example data required to break a rule that you specified.

How to write unit test for VIs that contain "Tick Count (ms)" LabVIEW function?

There is a VI that its outputs (indicators) depend not only on the inputs but also on the values of "Tick Count" functions. The problem is that it does not produce the same output for the same inputs. each time that I run it, it gives different outputs. so the unit test that only captures inputs and outputs would fail. So the question is how to write a unit test for this situation?
I cannot include the VI in the question as it contains several subVIs and the "tick count" functions are spread through all level of its subVIs.
EDIT1: I wrote a wrapper that subtracts the output values of two consecutive runs in order to eliminate the base reference time (which is undefined in this function) but it spoils the outputs.
I think you have been given a very difficult task, as the function you've been asked to test is non-deterministic and it is challenging to write unit tests against non-deterministic code.
There are some ways to test non-deterministic functions: for example one could test that a random number generator produced values uniformly distributed to some tolerance, or test that a clock-setting function matched an ntp server to some tolerance. But I think you're team will be happier if you can make the underlying code deterministic.
Your idea to use conditional disable is good, but I would add the additional step of creating a wrapper VI and then search and replace all native Tick Count with it. This way you can do any modifications to Tick Count in one place. If for some reason the code actually uses the tick count for something other than profiling (for example, it is being used to seed a pseudorandom number generator) you can have your "test/debug" case that read from a Notifier that you are injecting a set of fake tick counts into from your testing code. Notifiers work great for something like this.
You could add an (optional input) that allows you to override the tick count value. Give it a default value of -1, and in the VI you can use the tick count value if it's input is -1.
However I have never seen code relying on tick count.

How should I test random choices?

My domain involves a randomly determined choice and I'm not sure how to unit test it.
As a trivial example, let's say I want to ensure that myobject.makeChoice() returns true 75% of the time and false 25% of the time. How could I unit test this?
I could assert that myobject.getChoiceAPercent() is 75, but that
seems trivial, not useful, and unsatisfactory since it isn't testing the actual outcome.
I could run myobject.makeChoice() 1,000 times and assert that it
returns true 70% to 80% percent of the time, or some statistical
method like that, but that but that seems fragile, slow, and
unsatisfactory.
I could run a unit test with a predetermined random generator or
random seed and assert that makeChoice() run 5 times returns
[true, true, false, true, true], for example, but that seems the
same as asserting that random(123) == 456 and also seems unsatisfactory since I wouldn't be testing the actual domain I'm interested in.
It seems that random choices can be proven correct with inductive reasoning of the random generator itself but not unit testing. So is randomly generated content not amenable to automated testing or is there an easy way that I'm not aware of?
[edit] To avoid disputes over "true random" vs "pseudo random" etc, let's assume the following implementation:
public boolean makeChoice() {
return this.random.nextDouble() < 0.75;
}
How do I unit test that makeChoice returns true about 75% of the time?
Testing for randomness doesn't seem random. But to test for uniqueness/collision, I normally use a hash structure and insert the random value as a key. Duplicate keys are overwritten. By counting the final number of unique keys versus the total number of iterations, you can "test" the uniqueness of your algorithm.
Your code as written is not decoupled from the RNG. Maybe you could write like this:
public boolean makeChoice(double randnum) {
return randnum < 0.75;
}
And then you test key values to test implementation.
Or you could initialize the random object to a specific seed, that gives known random numbers between [0, 1) and test against what you expect to happen with those known numbers.
Or you could define an IRandom, write a front for Random that implements the interface and use it in the program. Then you can test it with a mock IRandom that gives, in order, numbers 0.00, 0.01, 0.02..., 0.99, 1.00 and count the number of successes.
Don't test the randomness of your code, test the results of the randomness by passing in or grabbing from a random number store a value.
It's a nice goal to get 100% unit test coverage but it is the law of diminishing returns. Did you write the PRNG?
Edit: Also check out this answer as it has a couple of good links: How do you test that something is random? Or "random enough'?
I second the "decoupling" strategy. I like to think that any that depends on random values, or the time, should just treat it, deterministicly, as "yet another input" from "yet another dependency". Then you inject an clock, or a RNG, that you're going to write or trust.
For example, in your case, is your object really going to behave differently if the "choice" is true 80% of the time rather than 75% of the time ? I suspect there is a large part of your code that simply cares whether the choice is true or false, and another one that makes the choice.
This opens the question of how you would test a random generator, in which case I suppose relying on the "great number" rules, some approximation, math, and simply trusting rand() is the better way to go.
Since it's a random boolean, writing 2 tests (one for TRUE, one for FALSE) might suffice if the behavior doesn't depend on the past "random" results (it's not clear, to me at least, from the question).
In other words: if consecutive outcomes don't depend on each other, you might test a single TRUE scenario, a single FALSE scenario and may be just fine.

Testing functions with random output

I am working on a test project to test a neural networks library...
The problem is that this library sometimes uses random numbers..
I need to derive test cases (input,expected output,actual output)...
Does anybody have an idea how to derive test cases (input,expected output,actual output) to a function that uses random numbers when taking actions and evaluating outputs??
Yes, you either have to run a large enough number of cases so that the randomness averages out, or you make the random source another input to your function or method so you can test it independently.
An example of the first kind (this is Python, but the principle can apply in any language).
def test_random_number():
total = sum(random.uniform(0, 1) for _ in xrange(1000))
assert 100 < total < 900
So this test can fail if you're unlucky, but it's still a reasonable test since it'll pass nearly all the time, and it's pretty simple to make this kind of test.
To do things 'properly', you need to inject the random source.
class DefaultRandomBehavior(object):
def pick_left_or_right(self):
return random.choice(['left', 'right'])
class AardvarkModeller(object):
def __init__(self, random_source=None):
self.random_source = random_source or DefaultRandomBehavior()
def aardvark_direction(self):
r = self.random_source.pick_left_or_right()
return 'The aardvark faces ' + r
Now you can unit test this by either mocking out or faking the DefaultRandomBehavior class, thus completely side-stepping the non-determinism.
It's unlikely that the library is really using random numbers as computers just aren't very good at generating those. Instead it's probably using a pseudo-random number generator seeded in some way, possibly from a 'real' random source or maybe from the current time. One way to make your results reproducible would be to teach the library to be able to accept a user supplied PRNG seed and set this to some constant for your test cases. The internal sequence of random numbers would then always be the same for your tests.
The second (and maybe more useful) approach would be to compare the expected output and actual output in an approximate way. If the use of random numbers makes such a big difference to your calculation that the results are really not reproducible you may want to think about the usefulness of the calculation. The trick would be to find some properties of the output of the library which can be compared numerically, with an allowable error, so I suspect you would want to compare the results of doing something with the neural network rather than compare the networks directly.

Unit testing a method that can have random behaviour

I ran across this situation this afternoon, so I thought I'd ask what you guys do.
We have a randomized password generator for user password resets and while fixing a problem with it, I decided to move the routine into my (slowly growing) test harness.
I want to test that passwords generated conform to the rules we've set out, but of course the results of the function will be randomized (or, well, pseudo-randomized).
What would you guys do in the unit test? Generate a bunch of passwords, check they all pass and consider that good enough?
A unit test should do the same thing every time that it runs, otherwise you may run into a situation where the unit test only fails occasionally, and that could be a real pain to debug.
Try seeding your pseudo-randomizer with the same seed every time (in the test, that is--not in production code). That way your test will generate the same set of inputs every time.
If you can't control the seed and there is no way to prevent the function you are testing from being randomized, then I guess you are stuck with an unpredictable unit test. :(
The function is a hypothesis that for all inputs, the output conforms to the specifications. The unit test is an attempt to falsify that hypothesis. So yes, the best you can do in this case is to generate a large amount of outputs. If they all pass your specification, then you can be reasonably sure that your function works as specified.
Consider putting the random number generator outside this function and passing a random number to it, making the function deterministic, instead of having it access the random number generator directly. This way, you can generate a large number of random inputs in your test harness, pass them all to your function, and test the outputs. If one fails, record what that value is so that you have a documented test case.
In addition to testing a few to make sure that they pass, I'd write a test to make sure that passwords that break the rules fail.
Is there anything in the codebase that's checking the passwords generated to make sure they're random enough? If not, I may look at creating the logic to check the generated passwords, testing that, and then you can state that the random password generator is working (as "bad" ones won't get out).
Once you've got that logic you can probably write an integration type test that would generate boatloads of passwords and pass it through the logic, at which point you'd get an idea of how "good" your random password generate is.
Well, considering they are random, there is no really way to make sure, but testing for 100 000 password should clear most doubts :)
You could seed your random number generator with a constant value in order to get non-random results and test those results.
I'm assuming that the user-entered passwords conform to the same restrictions as the random generated ones. So you probably want to have a set of static passwords for checking known conditions, and then you'll have a loop that does the dynamic password checks. The size of the loop isn't too important, but it should be large enough that you get that warm fuzzy feeling from your generator, but not so large that your tests take forever to run. If anything crops up over time, you can add those cases to your static list.
In the long run though, a weak password isn't going to break your program, and password security falls in the hands of the user. So your priority would be to make sure that the dynamic generation and strength-check doesn't break the system.
Without knowing what your rules are it's hard to say for sure, but assuming they are something like "the password must be at least 8 characters with at least one upper case letter, one lower case letter, one number and one special character" then it's impossible even with brute force to check sufficient quantities of generated passwords to prove the algorithm is correct (as that would require somewhere over 8^70 = 1.63x10^63 checks depending on how many special characters you designate for use, which would take a very, very long time to complete).
Ultimately all you can do is test as many passwords as is feasible, and if any break the rules then you know the algorithm is incorrect. Probably the best thing to do is leave it running overnight, and if all is well in the morning you're likely to be OK.
If you want to be doubly sure in production, then implement an outer function that calls the password generation function in a loop and checks it against the rules. If it fails then log an error indicating this (so you know you need to fix it) and generate another password. Continue until you get one that meets the rules.
You can also look into mutation testing (Jester for Java, Heckle for Ruby)
In my humble opinion you do not want a test that sometimes pass and sometimes fails. Some people may even consider that this kind of test is not a unit test. But the main idea is be sure that the function is OK when you see the green bar.
With this principle in mind you may try to execute it a reasonable number of times so that the chance of having a false correct is almost cero. However, une single failure of the test will force you to make more extensive tests apart from debbuging the failure.
Either use fixed random seed or make it reproducible (i.e.: derive from the current day)
Firstly, use a seed for your PRNG. Your input is no longer random and gets rid of the problem of unpredictable output - i.e. now your unit test is deterministic.
This doesn't however solve the problem of testing the implementation, but here is an example of how a typical method that relies upon randomness can be tested.
Imagine we've implemented a function that takes a collection of red and blue marbles and picks one at random, but a weighting can be assigned to the probability, i.e. weights of 2 and 1 would mean red marbles are twice as likely to be picked as blue marbles.
We can test this by setting the weight of one choice to zero and verifying that in all cases (in practice, for a large amount of test input) we always get e.g. blue marbles. Reversing the weights should then give the opposite result (all red marbles).
This doesn't guarantee our function is behaving as intended (if we pass in an equal number of red and blue marbles and have equal weights do we always get a 50/50 distribution over a large number of trials?) but in practice it is often sufficient.