Testing functions with random output

Testing functions with random output - unit-testing

I am working on a test project to test a neural networks library...
The problem is that this library sometimes uses random numbers..
I need to derive test cases (input,expected output,actual output)...
Does anybody have an idea how to derive test cases (input,expected output,actual output) to a function that uses random numbers when taking actions and evaluating outputs??

Yes, you either have to run a large enough number of cases so that the randomness averages out, or you make the random source another input to your function or method so you can test it independently.
An example of the first kind (this is Python, but the principle can apply in any language).
def test_random_number():
total = sum(random.uniform(0, 1) for _ in xrange(1000))
assert 100 < total < 900
So this test can fail if you're unlucky, but it's still a reasonable test since it'll pass nearly all the time, and it's pretty simple to make this kind of test.
To do things 'properly', you need to inject the random source.
class DefaultRandomBehavior(object):
def pick_left_or_right(self):
return random.choice(['left', 'right'])
class AardvarkModeller(object):
def __init__(self, random_source=None):
self.random_source = random_source or DefaultRandomBehavior()
def aardvark_direction(self):
r = self.random_source.pick_left_or_right()
return 'The aardvark faces ' + r
Now you can unit test this by either mocking out or faking the DefaultRandomBehavior class, thus completely side-stepping the non-determinism.

It's unlikely that the library is really using random numbers as computers just aren't very good at generating those. Instead it's probably using a pseudo-random number generator seeded in some way, possibly from a 'real' random source or maybe from the current time. One way to make your results reproducible would be to teach the library to be able to accept a user supplied PRNG seed and set this to some constant for your test cases. The internal sequence of random numbers would then always be the same for your tests.
The second (and maybe more useful) approach would be to compare the expected output and actual output in an approximate way. If the use of random numbers makes such a big difference to your calculation that the results are really not reproducible you may want to think about the usefulness of the calculation. The trick would be to find some properties of the output of the library which can be compared numerically, with an allowable error, so I suspect you would want to compare the results of doing something with the neural network rather than compare the networks directly.

Related

How to write a unit test to ensure output is within certain limits?

If I have a requirement that says the output of a function must always be within certain limits, how can I test this if the function returns unpredictable results?
For example, lets say I have a function that generates random numbers or passwords; how can I test that the number is always between 1000 and 9999 or the password is always between 6-12 characters?
Let's use the number generator example - currently I would write two tests: one that runs the function in a finite loop and tests each result is >= 1000 and a second that ensures the output is <= 9999. Obviously the loop must be run a large number of times (>9K) for each test to have any chance of covering all the possible output values.
Are there any better approaches (i.e. ones that would be considered more reliable) when testing the output of unpredictable functions

Generally you want to test every possible input, and make sure that the output is what you expect.
If you are testing a function with a random aspect, then this is difficult to do because it will have different outputs for the same input.
If possible, I would find where exactly the random value is used and then test every possible value that might come from that random value generator, but if you are testing the random value generator itself then your method you have described seems to be the best option.

How to write unit test for VIs that contain "Tick Count (ms)" LabVIEW function?

There is a VI that its outputs (indicators) depend not only on the inputs but also on the values of "Tick Count" functions. The problem is that it does not produce the same output for the same inputs. each time that I run it, it gives different outputs. so the unit test that only captures inputs and outputs would fail. So the question is how to write a unit test for this situation?
I cannot include the VI in the question as it contains several subVIs and the "tick count" functions are spread through all level of its subVIs.
EDIT1: I wrote a wrapper that subtracts the output values of two consecutive runs in order to eliminate the base reference time (which is undefined in this function) but it spoils the outputs.

I think you have been given a very difficult task, as the function you've been asked to test is non-deterministic and it is challenging to write unit tests against non-deterministic code.
There are some ways to test non-deterministic functions: for example one could test that a random number generator produced values uniformly distributed to some tolerance, or test that a clock-setting function matched an ntp server to some tolerance. But I think you're team will be happier if you can make the underlying code deterministic.
Your idea to use conditional disable is good, but I would add the additional step of creating a wrapper VI and then search and replace all native Tick Count with it. This way you can do any modifications to Tick Count in one place. If for some reason the code actually uses the tick count for something other than profiling (for example, it is being used to seed a pseudorandom number generator) you can have your "test/debug" case that read from a Notifier that you are injecting a set of fake tick counts into from your testing code. Notifiers work great for something like this.

You could add an (optional input) that allows you to override the tick count value. Give it a default value of -1, and in the VI you can use the tick count value if it's input is -1.
However I have never seen code relying on tick count.

How should I test random choices?

My domain involves a randomly determined choice and I'm not sure how to unit test it.
As a trivial example, let's say I want to ensure that myobject.makeChoice() returns true 75% of the time and false 25% of the time. How could I unit test this?
I could assert that myobject.getChoiceAPercent() is 75, but that
seems trivial, not useful, and unsatisfactory since it isn't testing the actual outcome.
I could run myobject.makeChoice() 1,000 times and assert that it
returns true 70% to 80% percent of the time, or some statistical
method like that, but that but that seems fragile, slow, and
unsatisfactory.
I could run a unit test with a predetermined random generator or
random seed and assert that makeChoice() run 5 times returns
[true, true, false, true, true], for example, but that seems the
same as asserting that random(123) == 456 and also seems unsatisfactory since I wouldn't be testing the actual domain I'm interested in.
It seems that random choices can be proven correct with inductive reasoning of the random generator itself but not unit testing. So is randomly generated content not amenable to automated testing or is there an easy way that I'm not aware of?
[edit] To avoid disputes over "true random" vs "pseudo random" etc, let's assume the following implementation:
public boolean makeChoice() {
return this.random.nextDouble() < 0.75;
}
How do I unit test that makeChoice returns true about 75% of the time?

Testing for randomness doesn't seem random. But to test for uniqueness/collision, I normally use a hash structure and insert the random value as a key. Duplicate keys are overwritten. By counting the final number of unique keys versus the total number of iterations, you can "test" the uniqueness of your algorithm.

Your code as written is not decoupled from the RNG. Maybe you could write like this:
public boolean makeChoice(double randnum) {
return randnum < 0.75;
}
And then you test key values to test implementation.
Or you could initialize the random object to a specific seed, that gives known random numbers between [0, 1) and test against what you expect to happen with those known numbers.
Or you could define an IRandom, write a front for Random that implements the interface and use it in the program. Then you can test it with a mock IRandom that gives, in order, numbers 0.00, 0.01, 0.02..., 0.99, 1.00 and count the number of successes.

Don't test the randomness of your code, test the results of the randomness by passing in or grabbing from a random number store a value.
It's a nice goal to get 100% unit test coverage but it is the law of diminishing returns. Did you write the PRNG?
Edit: Also check out this answer as it has a couple of good links: How do you test that something is random? Or "random enough'?

I second the "decoupling" strategy. I like to think that any that depends on random values, or the time, should just treat it, deterministicly, as "yet another input" from "yet another dependency". Then you inject an clock, or a RNG, that you're going to write or trust.
For example, in your case, is your object really going to behave differently if the "choice" is true 80% of the time rather than 75% of the time ? I suspect there is a large part of your code that simply cares whether the choice is true or false, and another one that makes the choice.
This opens the question of how you would test a random generator, in which case I suppose relying on the "great number" rules, some approximation, math, and simply trusting rand() is the better way to go.

Since it's a random boolean, writing 2 tests (one for TRUE, one for FALSE) might suffice if the behavior doesn't depend on the past "random" results (it's not clear, to me at least, from the question).
In other words: if consecutive outcomes don't depend on each other, you might test a single TRUE scenario, a single FALSE scenario and may be just fine.

Unit testing a method that can have random behaviour

I ran across this situation this afternoon, so I thought I'd ask what you guys do.
We have a randomized password generator for user password resets and while fixing a problem with it, I decided to move the routine into my (slowly growing) test harness.
I want to test that passwords generated conform to the rules we've set out, but of course the results of the function will be randomized (or, well, pseudo-randomized).
What would you guys do in the unit test? Generate a bunch of passwords, check they all pass and consider that good enough?

A unit test should do the same thing every time that it runs, otherwise you may run into a situation where the unit test only fails occasionally, and that could be a real pain to debug.
Try seeding your pseudo-randomizer with the same seed every time (in the test, that is--not in production code). That way your test will generate the same set of inputs every time.
If you can't control the seed and there is no way to prevent the function you are testing from being randomized, then I guess you are stuck with an unpredictable unit test. :(

The function is a hypothesis that for all inputs, the output conforms to the specifications. The unit test is an attempt to falsify that hypothesis. So yes, the best you can do in this case is to generate a large amount of outputs. If they all pass your specification, then you can be reasonably sure that your function works as specified.
Consider putting the random number generator outside this function and passing a random number to it, making the function deterministic, instead of having it access the random number generator directly. This way, you can generate a large number of random inputs in your test harness, pass them all to your function, and test the outputs. If one fails, record what that value is so that you have a documented test case.

In addition to testing a few to make sure that they pass, I'd write a test to make sure that passwords that break the rules fail.
Is there anything in the codebase that's checking the passwords generated to make sure they're random enough? If not, I may look at creating the logic to check the generated passwords, testing that, and then you can state that the random password generator is working (as "bad" ones won't get out).
Once you've got that logic you can probably write an integration type test that would generate boatloads of passwords and pass it through the logic, at which point you'd get an idea of how "good" your random password generate is.

Well, considering they are random, there is no really way to make sure, but testing for 100 000 password should clear most doubts :)

You could seed your random number generator with a constant value in order to get non-random results and test those results.

I'm assuming that the user-entered passwords conform to the same restrictions as the random generated ones. So you probably want to have a set of static passwords for checking known conditions, and then you'll have a loop that does the dynamic password checks. The size of the loop isn't too important, but it should be large enough that you get that warm fuzzy feeling from your generator, but not so large that your tests take forever to run. If anything crops up over time, you can add those cases to your static list.
In the long run though, a weak password isn't going to break your program, and password security falls in the hands of the user. So your priority would be to make sure that the dynamic generation and strength-check doesn't break the system.

Without knowing what your rules are it's hard to say for sure, but assuming they are something like "the password must be at least 8 characters with at least one upper case letter, one lower case letter, one number and one special character" then it's impossible even with brute force to check sufficient quantities of generated passwords to prove the algorithm is correct (as that would require somewhere over 8^70 = 1.63x10^63 checks depending on how many special characters you designate for use, which would take a very, very long time to complete).
Ultimately all you can do is test as many passwords as is feasible, and if any break the rules then you know the algorithm is incorrect. Probably the best thing to do is leave it running overnight, and if all is well in the morning you're likely to be OK.
If you want to be doubly sure in production, then implement an outer function that calls the password generation function in a loop and checks it against the rules. If it fails then log an error indicating this (so you know you need to fix it) and generate another password. Continue until you get one that meets the rules.

You can also look into mutation testing (Jester for Java, Heckle for Ruby)

In my humble opinion you do not want a test that sometimes pass and sometimes fails. Some people may even consider that this kind of test is not a unit test. But the main idea is be sure that the function is OK when you see the green bar.
With this principle in mind you may try to execute it a reasonable number of times so that the chance of having a false correct is almost cero. However, une single failure of the test will force you to make more extensive tests apart from debbuging the failure.

Either use fixed random seed or make it reproducible (i.e.: derive from the current day)

Firstly, use a seed for your PRNG. Your input is no longer random and gets rid of the problem of unpredictable output - i.e. now your unit test is deterministic.
This doesn't however solve the problem of testing the implementation, but here is an example of how a typical method that relies upon randomness can be tested.
Imagine we've implemented a function that takes a collection of red and blue marbles and picks one at random, but a weighting can be assigned to the probability, i.e. weights of 2 and 1 would mean red marbles are twice as likely to be picked as blue marbles.
We can test this by setting the weight of one choice to zero and verifying that in all cases (in practice, for a large amount of test input) we always get e.g. blue marbles. Reversing the weights should then give the opposite result (all red marbles).
This doesn't guarantee our function is behaving as intended (if we pass in an equal number of red and blue marbles and have equal weights do we always get a 50/50 distribution over a large number of trials?) but in practice it is often sufficient.

Testing for Random Value - Thoughts on this Approach?

OK, I have been working on a random image selector and queue system (so you don't see the same images too often).
All was going swimmingly (as far as my crappy code does) until I got to the random bit. I wanted to test it, but how do you test for it? There is no Debug.Assert(i.IsRandom) (sadly) :D
So, I got my brain on it after watering it with some tea and came up with the following, I was just wondering if I could have your thoughts?
Basically I knew the random bit was the problem, so I ripped that out to a delegate (which would then be passed to the objects constructor).
I then created a class that pretty much performs the same logic as the live code, but remembers the value selected in a private variable.
I then threw that delegate to the live class and tested against that:
i.e.
Debug.Assert(myObj.RndVal == RndIntTester.ValuePassed);
But I couldn't help but think, was I wasting my time? I ran that through lots of iterations to see if it fell over at any time etc.
Do you think I was wasting my time with this? Or could I have got away with:
GateKiller's answer reminded me of this:
Update to Clarify
I should add that I basically never want to see the same result more than X number of times from a pool of Y size.
The addition of the test container basically allowed me to see if any of the previously selected images were "randomly" selected.
I guess technically the thing here being tested in not the RNG (since I never wrote that code) but the fact that am I expecting random results from a limited pool, and I want to track them.

Test from the requirement : "so you don't see the same images too often"
Ask for 100 images. Did you see an image too often?

There is a handy list of statistical randomness tests and related research on Wikipedia. Note that you won't know for certain that a source is truly random with most of these, you'll just have ruled out some ways in which it may be easily predictable.

If you have a fixed set of items, and you don't want them to repeat too often, shuffle the collection randomly. Then you will be sure that you never see the same image twice in a row, feel like you're listening to Top 20 radio, etc. You'll make a full pass through the collection before repeating.
Item[] foo = …
for (int idx = foo.size(); idx > 1; --idx) {
/* Pick random number from half-open interval [0, idx) */
int rnd = random(idx);
Item tmp = foo[idx - 1];
foo[idx - 1] = foo[rnd];
foo[rnd] = tmp;
}
If you have too many items to collect and shuffle all at once (10s of thousands of images in a repository), you can add some divide-and-conquer to the same approach. Shuffle groups of images, then shuffle each group.
A slightly different approach that sounds like it might apply to your revised problem statement is to have your "image selector" implementation keep its recent selection history in a queue of at most Y length. Before returning an image, it tests to see if its in the queue X times already, and if so, it randomly selects another, until it find one that passes.
If you are really asking about testing the quality of the random number generator, I'll have to open the statistics book.

It's impossible to test if a value is truly random or not. The best you can do is perform the test some large number of times and test that you got an appropriate distribution, but if the results are truly random, even this has a (very small) chance of failing.
If you're doing white box testing, and you know your random seed, then you can actually compute the expected result, but you may need a separate test to test the randomness of your RNG.

The generation of random numbers is
too important to be left to chance. -- Robert R. Coveyou
To solve the psychological problem:
A decent way to prevent apparent repetitions is to select a few items at random from the full set, discarding duplicates. Play those, then select another few. How many is "a few" depends on how fast you're playing them and how big the full set is, but for example avoiding a repeat inside the larger of "20", and "5 minutes" might be OK. Do user testing - as the programmer you'll be so sick of slideshows you're not a good test subject.
To test randomising code, I would say:
Step 1: specify how the code MUST map the raw random numbers to choices in your domain, and make sure that your code correctly uses the output of the random number generator. Test this by Mocking the generator (or seeding it with a known test value if it's a PRNG).
Step 2: make sure the generator is sufficiently random for your purposes. If you used a library function, you do this by reading the documentation. If you wrote your own, why?
Step 3 (advanced statisticians only): run some statistical tests for randomness on the output of the generator. Make sure you know what the probability is of a false failure on the test.

There are whole books one can write about randomness and evaluating if something appears to be random, but I'll save you the pages of mathematics. In short, you can use a chi-square test as a way of determining how well an apparently "random" distribution fits what you expect.
If you're using Perl, you can use the Statistics::ChiSquare module to do the hard work for you.
However if you want to make sure that your images are evenly distributed, then you probably won't want them to be truly random. Instead, I'd suggest you take your entire list of images, shuffle that list, and then remove an item from it whenever you need a "random" image. When the list is empty, you re-build it, re-shuffle, and repeat.
This technique means that given a set of images, each individual image can't appear more than once every iteration through your list. Your images can't help but be evenly distributed.
All the best,
Paul

What the Random and similar functions give you is but pseudo-random numbers, a series of numbers produced through a function. Usually, you give that function it's first input parameter (a.k.a. the "seed") which is used to produce the first "random" number. After that, each last value is used as the input parameter for the next iteration of the cycle. You can check the Wikipedia article on "Pseudorandom number generator", the explanation there is very good.
All of these algorithms have something in common: the series repeats itself after a number of iterations. Remember, these aren't truly random numbers, only series of numbers that seem random. To select one generator over another, you need to ask yourself: What do you want it for?
How do you test randomness? Indeed you can. There are plenty of tests for that. The first and most simple is, of course, run your pseudo-random number generator an enormous number of times, and compile the number of times each result appears. In the end, each result should've appeared a number of times very close to (number of iterations)/(number of possible results). The greater the standard deviation of this, the worse your generator is.
The second is: how much random numbers are you using at the time? 2, 3? Take them in pairs (or tripplets) and repeat the previous experiment: after a very long number of iterations, each expected result should have appeared at least once, and again the number of times each result has appeared shouldn't be too far away from the expected. There are some generators which work just fine for taking one or 2 at a time, but fail spectacularly when you're taking 3 or more (RANDU anyone?).
There are other, more complex tests: some involve plotting the results in a logarithmic scale, or onto a plane with a circle in the middle and then counting how much of the plots fell within, others... I believe those 2 above should suffice most of the times (unless you're a finicky mathematician).

Random is Random. Even if the same picture shows up 4 times in a row, it could still be considered random.

My opinion is that anything random cannot be properly tested.
Sure you can attempt to test it, but there are so many combinations to try that you are better off just relying on the RNG and spot checking a large handful of cases.

Well, the problem is that random numbers by definition can get repeated (because they are... wait for it: random). Maybe what you want to do is save the latest random number and compare the calculated one to that, and if equal just calculate another... but now your numbers are less random (I know there's not such a thing as "more or less" randomness, but let me use the term just this time), because they are guaranteed not to repeat.
Anyway, you should never give random numbers so much thought. :)

As others have pointed out, it is impossible to really test for randomness. You can (and should) have the randomness contained to one particular method, and then write unit tests for every other method. That way, you can test all of the other functionality, assuming that you can get a random number out of that one last part.

store the random values and before you use the next generated random number, check against the stored value.

Any good pseudo-random number generator will let you seed the generator. If you seed the generator with same number, then the stream of random numbers generated will be the same. So why not seed your random number generator and then create your unit tests based on that particular stream of numbers?

To get a series of non-repeating random numbers:
Create a list of random numbers.
Add a sequence number to each random number
Sort the sequenced list by the original random number
Use your sequence number as a new random number.

Don't test the randomness, test to see if the results your getting are desirable (or, rather, try to get undesirable results a few times before accepting that your results are probably going to be desirable).
It will be impossible to ensure that you'll never get an undesirable result if you're testing a random output, but you can at least increase the chances that you'll notice it happening.
I would either take N pools of Y size, checking for any results that appear more than X number of times, or take one pool of N*Y size, checking every group of Y size for any result that appears more than X times (1 to Y, 2 to Y + 1, 3 to Y + 2, etc). What N is would depend on how reliable you want the test to be.

Random numbers are generated from a distribution. In this case, every value should have the same propability of appearing. If you calculate an infinite amount of randoms, you get the exact distribution.
In practice, call the function many times and check the results. If you expect to have N images, calculate 100*N randoms, then count how many of each expected number were found. Most should appear 70-130 times. Re-run the test with different random-seed to see if the results are different.
If you find the generator you use now is not good enough, you can easily find something. Google for "Mersenne Twister" - that is much more random than you ever need.
To avoid images re-appearing, you need something less random. A simple approach would be to check for the unallowed values, if its one of those, re-calculate.

Although you cannot test for randomness, you can test that for correlation, or distribution, of a sequence of numbers.
Hard to test goal: Each time we need an image, select 1 of 4 images at random.
Easy to test goal: For every 100 images we select, each of the 4 images must appear at least 20 times.

I agree with Adam Rosenfield. For the situation you're talking about, the only thing you can usefully test for is distribution across the range.
The situation I usually encounter is that I'm generating pseudorandom numbers with my favourite language's PRNG, and then manipulating them into the desired range. To check whether my manipulations have affected the distribution, I generate a bunch of numbers, manipulate them, and then check the distribution of the results.
To get a good test, you should generate at least a couple orders of magnitude more numbers than your range holds. The more values you use, the better the test. Obviously if you have a really large range, this won't work since you'll have to generate far too many numbers. But in your situation it should work fine.
Here's an example in Perl that illustrates what I mean:
for (my $i=0; $i<=100000; $i++) {
my $r = rand; # Get the random number
$r = int($r * 1000); # Move it into the desired range
$dist{$r} ++; # Count the occurrences of each number
}
print "Min occurrences: ", (sort { $a <=> $b } values %dist)[1], "\n";
print "Max occurrences: ", (sort { $b <=> $a } values %dist)[1], "\n";
If the spread between the min and max occurrences is small, then your distribution is good. If it's wide, then your distribution may be bad. You can also use this approach to check whether your range was covered and whether any values were missed.
Again, the more numbers you generate, the more valid the results. I tend to start small and work up to whatever my machine will handle in a reasonable amount of time, e.g. five minutes.

Supposing you are testing a range for randomness within integers, one way to verify this is to create a gajillion (well, maybe 10,000 or so) 'random' numbers and plot their occurrence on a histogram.
****** ****** ****
***********************************************
*************************************************
*************************************************
*************************************************
*************************************************
*************************************************
*************************************************
*************************************************
*************************************************
1 2 3 4 5
12345678901234567890123456789012345678901234567890
The above shows a 'relatively' normal distribution.
if it looked more skewed, such as this:
****** ****** ****
************ ************ ************
************ ************ ***************
************ ************ ****************
************ ************ *****************
************ ************ *****************
*************************** ******************
**************************** ******************
******************************* ******************
**************************************************
1 2 3 4 5
12345678901234567890123456789012345678901234567890
Then you can see there is less randomness. As others have mentioned, there is the issue of repetition to contend with as well.
If you were to write a binary file of say 10,000 random numbers from your generator using, say a random number from 1 to 1024 and try to compress that file using some compression (zip, gzip, etc.) then you could compare the two file sizes. If there is 'lots' of compression, then it's not particularly random. If there isn't much of a change in size, then it's 'pretty random'.
Why this works
The compression algorithms look for patterns (repetition and otherwise) and reduces that in some way. One way to look a these compression algorithms is a measure of the amount of information in a file. A highly compressed file has little information (e.g. randomness) and a little-compressed file has much information (randomness)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Testing functions with random output - unit-testing

Related

How to write a unit test to ensure output is within certain limits?

How to write unit test for VIs that contain "Tick Count (ms)" LabVIEW function?

How should I test random choices?

Unit testing a method that can have random behaviour

Testing for Random Value - Thoughts on this Approach?

Categories

Resources