Using random numbers vs hardcoded values in unit tests [duplicate] - unit-testing

This question already has answers here:
What are the downsides using random values in Unit Testing?
(11 answers)
Random data in Unit Tests?
(20 answers)
Closed 3 years ago.
When I write tests, I like to use random numbers to calculate things.
e.g.
func init() {
rand.Seed(time.Now().UnixNano())
}
func TestXYZ(t *testing.T) {
amount := rand.Intn(100)
cnt := 1 + rand.Intn(10)
for i := 0; i < cnt; i++ {
doSmth(amount)
}
//more stuff
}
which of course has the disadvantage that
expected := calcExpected(amount, cnt)
in that the expected value for the test needs to be calculated from the random values.
If have received criticism for this approach:
It makes the test unnecessarily complex
Less reproduceable due to randomness
I think though that without randomness, I could actually:
Make up my results, e.g. the test only works for a specific value. Randomness proves my test is "robust"
Catch more edge cases (debatable as edge cases are usually specific, e.g. 0,1,-1)
Is it really that bad to use random numbers?
(I realize this is a bit of an opinion question, but I am very much interested in people's point of views, don't mind downvotes).

Your question is not specific to Go. This applies to any language and really any kind of unit testing.
the expected value for the test needs to be calculated from the random values.
That is the major problem. If you have even moderately complicated application logic when the logic changes in the application, you also have to change that same logic in your tests. You have to implement those changes TWICE.
Presumably each one is equally complex and each one is implemented somewhat differently because if you're just copy-pasting or reusing code from your app in your test to compute the expected value, then they're going to agree and the test is pointless.
Testing with fixed values in your unit tests keeps the tests simple and exercises the code.
Testing with random values is done, its called fuzzing. I am not an expert in fuzzing in tests. Testing with random values is one aspect of fuzzing but the nuance is in testing random values that are likely to find edge cases, catch bugs, exercise unused code branches, or uncover leaks.

Related

Is it possible to write a unit test that cover everything?

Let's say I have a function
function (int x) {
if (x < 10) return true;
return false;
}
Ideally, you want to write 2^32 - 1 test cases to cover from INT_MIN to INT_MAX? Of course this is not practical.
To make life easier, we write test cases for
x < 10, test x = 9 expect true
x == 10, test x = 10 expect false
x > 10, test x = 11 expect false
These test cases are fine but it does not cover every case. Let's say one day someone modified the function to be
function (int x) {
if (x == 12) return true;
if (x < 10) return true;
return false;
}
he will run the test and realize all the test passed. How do we make sure we cover every senario without going to extreme. Is there a key word for this issue I am describing?
This is partly a comment partly an answer because of the way you phrased the question.
The comment
Is it possible to write a unit test that cover everything?
No. Even in your example you limit the test cases to 2^32 but what if the code is moved to a 64 bit system and then someone adds a line using 2^34 or something.
Also your question indicates to me that you are thinking of static test cases with dynamic code, e.g. the code is dynamic in that it is changed over time by a programmer, this does not mean dynamically modified by the code. You should be thinking dynamic test cases with dynamic code.
Lastly you did not note if it was white, gray or black box testing.
The answer
Let a tool analyze the code and generate the tests data.
See: A Survey on Automatic Test Data Generation
Also you asked about key words for searching.
Here is a Google search for this that I found of value:
code analysis automated test generation survey
Related
I have never used one of these test case tools myself as I use Prolog DCG to generate my test cases and currently with a project I am doing generate millions of test cases in about two minutes and test them over a few minutes. Some of the test cases that fail I would never have thought up on my own so this may be considered overkill by some, but it works.
Since many people don't know Prolog DCGs here is a similar way explained using C# with LINQ by Eric Lippert, Every Binary Tree There Is
No, there's not currently a general algorithm for this that doesn't involve some kind of very intensive computation (e.g. testing lots and lots of cases), but you can write your unit tests in such a way that they'll have a higher probability of failing in the case of a change to the method. For example, in the answer given, write a test for x = 10. For the other two cases, first pick a couple of random numbers between 11 and int.Max and test those. Then test a couple of random numbers between int.Min and 9. The test wouldn't necessarily fail after the modification you describe, but there's a better chance that it would fail than if you had just hardcoded the value.
Also, as #GuyCoder pointed out in his excellent answer, even if you did try to do something like that, it's remarkably difficult (or impossible) to prove that there are no possible changes to a method that would break your test.
Also, keep in mind that no kind of test automation (including unit testing) is a foolproof method of testing; even under ideal conditions, you generally can't 100% prove that your program is correct. Keep in mind that virtually all software testing approaches are fundamentally empirical methods and empirical methods can't really achieve 100% certainty. (They can achieve a good deal of certainty, though; in fact, many scientific papers achieve 95% certainty or higher - sometimes much higher - so in cases like that the difference may not be all that important). For example, even if you have 100% code coverage, how do you know that there's not an error in the tests somewhere? Are you going to write tests for the tests? (This can lead to a turtles all the way down type situation).
If you want to get really literal about it and you buy into David Hume, you really can't ever be 100% sure about something based on empirical testing; the fact that a test has passed every time you've run it doesn't mean that it'll continue to pass in the future. I digress, though.
If you're interested, formal verification studies methods of deductively proving that the software (or, at least, certain aspects of the software) are correct. Note that the major issue with that is that it tends to be very difficult or impossible to achieve formal verification of a program of a complete system of any complexity, though, especially if you're using third-party libraries that aren't formally verified. (Those, along with the difficulty of learning the techniques in the first place, are some of the main reasons that formal verification hasn't really taken off outside of academia and certain very narrow industry applications).
A final point: software ships with bugs. You'd be hard-pressed to find any complicated system that was 100% defect-free at the time that it was released. As I mentioned above, there is no currently-known technique to guarantee that your testing found all of the bugs (and if you can find one you'll become a very wealthy individual), so for the most part you'll have to rely on statistical measures to know whether you've tested adequately.
TL;DR No, you can't, and even if you could you still couldn't be 100% sure that your software was correct (there might be a bug in your tests, for example). For the foreseeable future, your unit test cases will need maintenance too. You can write the tests to be more resilient against changes, though.

How do I write a unit test when the class to test is complicated?

I am trying to employ TDD in writing a backgammon game in C++ using VS 2010.
I have set up CxxTest to write the test cases.
The first class to test is
class Position
{
public:
...
...
bool IsSingleMoveValid(.....)
...
...
}
I 'd like to write a test for the function IsSingleMoveValid(), and I guess the test should prove that the function works correctly. Unfortunately there are so many cases to test and even if I test several cases, some might escape.
What do you suggest ? How does TDD handle these problems ?
A few guidelines:
Test regular cases. In your problem: test legal moves that you KNOW are valid. You can either take the easy way and have only a handful of test cases, or you can write a loop generating all possible legal moves that can occur in your application and test them all.
Test boundary cases. This is not really applicable to your problem, but for testing simple numerical functions of the form f(x) where you know that x has to lie in a range [x_min, x_max), you would typically also test f(x_min-1), f(x_min), f(x_max-1), f(x_max). (It could be relevant for board games if you have an internal board representation with an overflow edge around it)
Test known bugs. If you ever come across a legal move that is not recognized by your IsSingleMoveValid(), you add this as a testcase and then fix your code. It's useful to keep such test cases to guard against future regressions (some future code additions/modifications could re-introduce this bug, and the test will catch it).
The test coverage (percentage of code lines covered by tests) is a target that can be calculated by tools such as gcov You should do your own cost-benefit analysis how thorough you want to test your code. But for something as essential as legal move detection in a game program, I'd suggest you be vigilant here.
Others have already commented on breaking up the tests in smaller subtests. The nomenclature for that is that such isolated functions are tested with unit testing, whereas the collabaration between such functions in higher-level code is tested with integration testing.
Generally, by breaking complex classes into multiple simpler classes, each doing a well-defined task that is easy to test.
If you are writing the tests then the easiest thing to do is to break your IsSingleMoveValid function down into smaller functions and test them individually.
As you can see on Wikipedia, TDD - Test Driven Development means writing the test first.
In your case, it would mean to establish all valid moves and write a test function for them. Then, you write code for each of those breaking test, until all the test pass.
... Unfortunately there are so many cases to test and even if I test several cases, some might escape.
As other said, when a function is too complex it is time for Refactoring!
I strongly suggest you the book Refactoring - Improve the Design of Existing Code from Martin Fowler with contribution of Kent Beck and others. It is both a learning and reference book which makes it very valuable in my opinion.
This is probably the best book on refactoring and it will teach you how to split your function without breaking everything. Also, refactoring is a really important asset for TDD. :)
There is no such thing as "too many cases to test". If the code to handling a set of cases can be written, they need to be thought. If they can be written and are thought, they code that test them can we written as well. In average, for each 10 lines of (testable) code that you write, you can add a constant factor of testing code associated to it.
Of course, the whole trick is knowing how to write code that matches the testable description.
Hence, you need to start by writing a test for all the cases.
if there is a big, let's say for the sake of discussion that you have a countable set of possible cases to test (i.e: that add(n,m) == n+m for all n and m integer), but your actual code is really simple; return n+m. This of course is trivially true but don't miss the point: you don't need to test all the possible moves in the board, TDD aims so that your tests cover all the code (i.e: the tests exercise all the if branches in your code), not necessarily all possible values or combinations of states (which are exponentially big)
a project with 80-90% of line coverage, means that your tests exercise 9 lines out of each 10 lines of your code. In general if there is a bug in your code, it will in the majority of circumstances be evidenced when walking a particular code path.

What to test when writing Unit Tests?

I want to begin unit testing our application, because I believe that this is the first step to developing a good relationship with testing and will allow me to branch into other forms of testing, most interesting BDD with Cucumber.
We currently generate all of our Base classes using Codesmith which are based entirely on the tables in a database. I am curious as to the benefits of generating test cases with these Base classes? Is this poor testing practices?
This leads me to the ultimate question of my post. What do we test when using Unit Tests?
Do we test the examples we know we want out? or do we test the examples we do not want?
Their can be methods that have multiple ways of Failing and multiple ways of Success, how do we know when to stop?
Take a Summing function for example. Give it 1,2 and expect 3 in the only unit test.. how do we know that 5,6 isn't coming back 35?
Question Recap
Generating unit tests (Good/Bad)
What/How much do we test?
Start with your requirements and write tests that test the expected behavior. From that point on, how many other scenarios you test can be driven by your schedule, or maybe by your recognizing non-success scenarios that are particularly high-risk.
You might consider writing non-success tests only in response to defects you (or your users) discover (the idea being that you write a test that tests the defect fix before you actually fix the defect, so that your test will fail if that defect is re-introduced into your code in future development).
The point of unit tests is to give you confidence (but only in special cases does it give you certainty) that the actual behavior of your public methods matches the expected behavior. Thus, if you have a class Adder
class Adder { public int Add(int x, int y) { return x + y; } }
and a corresponding unit test
[Test]
public void Add_returns_that_one_plus_two_is_three() {
Adder a = new Adder();
int result = a.Add(1, 2);
Assert.AreEqual(3, result);
}
then this gives you some (but not 100%) confidence that the method under test is behaving appropriately. It also gives you some defense against breaking the code upon refactoring.
What do we test when using Unit Tests?
The actual behavior of your public methods against the expected (or specified) behavior.
Do we test the examples we know we want out?
Yes, one way to gain confidence in the correctness of your method is to take some input with known expected output, execute the public method on the input and compare the acutal output to the expected output.
What to test: Everything that has ever gone wrong.
When you find a bug, write a test for the buggy behavior before you fix the code. Then, when the code is working correctly, the test will pass, and you'll have another test in your arsenal.
1) To start, i'd recommend you to test your app's core logic.
2) Then, use code coverage tool in vs to see whether all of your code is used in tests(all branches of if-else, case conditions are invoked).
This is some sort of an answer to your question about testing 1+2 = 3, 5 + 6 = 35: when code is covered, you can feel safe with further experiments.
3)It's a good practice to cover 80-90% of code: the rest of work is usually unefficient: getters-setters, 1-line exception handling, etc.
4) Learn about separation of concerns.
5) Generation unit tests - try it, you'll see, that you can save a pretty lines of code writing them manually. I prefer generating the file with vs, then write the rest TestMethods by myself.
You unittest things where you
want to make sure your algorithm works
want to safeguard against accidental changes in the future
So in your example it would not make much sense to test the generated classes. Test the generator instead.
It's good practice to test the main use cases (what the tested function was designed for) first. Then you test the main error cases. Then you write tests for corner cases (i.e. lower and upper bounds). The unusual error cases are normally so hard to produce that it doesn't make sense to unit-test them.
If you need to verify a large range of parameter sets, use data-driven testing.
How many things you test is a matter of effort vs. return, so it really depends on the individual project. Normally you try to follow the 80/20 rule, but there may be applications where you need more test coverage because a failure would have very serious consequences.
You can dramatically reduce the time you need to write tests if you use a test-driven approach (TDD). That's because code that isn't written with testability in mind is much harder, sometimes near to impossible to test. But since nothing in life is free, the code developed with TDD tends to be more complex itself.
I'm also beginning the process of more consistently using unit tests and what I've found is that the biggest task in unit testing is structuring my code to support testing. As I start to think about how to write tests, it becomes clear where classes have become overly coupled, to the point that the complexity of the 'unit' makes defining tests difficult. I spend as much or more time refactoring my code as I do writing tests. Once the boundaries between testable units become clearer, the question of where to start testing resolves itself; start with your smallest isolated dependencies (or at least the ones you're worried about) and work your way up.
There are three basic events I test for:
min, max, and somewhere between min and max.
And where appropriate two extremes: below min, and above max.
There are obvious exceptions (some code may not have a min or max for example) but I've found that unit testing for these events is a good start and captures a majority of "common" issues with the code.

Using randomness and/or iterations in unit tests?

In unit tests, I have become used to test methods applying some regular values, some values offending the method contract, and all border-cases I can come up with.
But is it very bad-practice to
test on random values, this is a value within a range you think should never give any trouble, so that each time the test runs, another value is passed in? As a kind of extensive testing of regular values?
test on whole ranges, using iteration ?
I have a feeling both of this approaches aren't any good. With range-testing I can imagine that it's just not practical to do that, since the time it is taking, but with randomness?
UPDATE :
I'm not using this technique myself, was just wondering about it. Randomness can be a good tool, I know now, if you can make it reproduceable when you need to.
The most interesting reply was the 'fuzzing' tip from Lieven :
http://en.wikipedia.org/wiki/Fuzz_testing
tx
Unit tests need to be fast. if they aren't people won't run them regularly. At times I did code for checking the whole range but #Ignore'd commented it out in the end because it made the tests too slow. If I were to use random values, I would go for a PRNG with fixed seeds so that every run actually checks the same numbers.
Random Input - The tests would not be repeatable (produce consistent results every time they are run and hence are not considered good unit tests. Tests should not change their mind.
Range tests / RowTests - are good as long as they dont slow down the test suite run.. each test should run as fast as possible. (A done-in-30sec test suite gets run more often than a 10 min one) - preferably 100ms or less. That said Each input (test data) should be 'representative' input. If all input values are the same, testing each one isn't adding any value and is just routine number crunching. You just need one representative from that set of values. You also need representatives for boundary conditions and 'special' values.
For more on guidelines or thumbrules - see 'What makes a Good Unit Test?'
That said... the techniques you mentioned could be great to find representative inputs.. So use them to find scenarioX where code fails or succeeds incorrectly - then write up a repeatable,quick,tests-one-thing-only unit test for that scenarioX and add it to your test suite. If you find that these tools continue to help you find more good test-cases.. persist with them.
Response to OP's clarification:
If you use the same seed value (test input) for your random no generator on each test run, your test is not random - values can be predetermined. However a unit test ideally shouldn't need any input/output - that is why xUnit test cases have the void TC() signature.
If you use different seed values on each run, now your tests are random and not repeatable. Of course you can hunt down the special seed value in your log files to know what failed (and reproduce the error) but I like my tests to instantly let me know what failed - e.g. a Red TestConversionForEnums() lets me know that the Enum Conversion code is broken without any inspection.
Repeatable - implies that each time the test is run on the SUT, it produces the same result (pass/fail).. not 'Can I reproduce test failure again?' (Repeatable != Reproducible). To reiterate.. this kind of exploratory testing may be good to identify more test cases but I wouldn't add this to my test suite that I run each time I make a code change during the day. I'd recommend doing exploratory testing manually, find some good (some may use sadistic) Testers that'll go hammer and tongs at your code.. will find you more test cases than a random input generator.
I have been using randomness in my testcases. It found me some errors in the SUT and it gave me some errors in my testcase.
Note that the testcase get more complex by using randomnes.
You'll need a method to run your testcase with the random value(s) it failed on
You'll need to log the random values used for every test.
...
All in all, I'm throthling back on using randomness but not dismissing it enterly. As with every technique, it has its values.
For a better explanation of what you are after, look up the term fuzzing
What you describe is usually called specification-based testing and has been implemented by frameworks such as QuickCheck (Haskell), scalacheck (Scala) and Quviq QuickCheck (Erlang).
Data-based testing tools (such as DataProvider in TestNG) can achieve similar results.
The underlying principle is to generate input data for the subject under test based upon some sort of specification and is far from "bad practice".
What are you testing? The random number generator? Or your code?
If your code, what if there is a bug in the code that produces random numbers?
What if you need to reproduce a problem, do you keep restarting the test hoping that it will eventually use the same sequence as you had when you discovered the problem?
If you decide to use a random number generator to produce data, at least seed it with a known constant value, so it's easy to reproduce.
In other words, your "random numbers" are just a "sequence of numbers I really don't care all that much about".
So long as it will tell you in some way what random value it failed on I don't suppose it's that bad. However, you're almost relying on luck to find a problem in your application then.
Testing the whole range will ensure you have every avenue covered but it seems like overkill when you have the edges covered and, I assume, a few middle-ground accepted values.
The goal of unit-testing is to get confidence in your code. Therefore, if you feel that using random values could help you find some more bugs, you obviously need more tests to increase your confidence level.
In that situation, you could rely on iteration-based testing to identify those problems.
I'd recommend creating new specific tests for the cases discovered with the loop testing, and removing the iteration-based tests then; so that they don't slow down your tests.
I have used randomness for debugging a field problem with a state machine leaking a resource. We code inspected, ran the unit-tests and couldn't reproduce the leak.
We fed random events from the entire possible event space into the state machine unit test environment. We looked at the invariants after each event and stopped when they were violated.
The random events eventually exposed a sequence of events that produced a leak.
The state machine leaked a resource when a 2nd error occurred while recovering from a first error.
We were then able to reproduce the leak in the field.
So randomness found a problem that was difficult to find otherwise. A little brute force but the computer didn't mind working the weekend.
I wouldn't advocate completely random values as it will give you a false sense of security. If you can't go through the entire range (which is often the case) it's far more efficient to select a subset by hand. This way you will also have to think of possible "odd" values, values that causes the code to run differently (and are not near edges).
You could use a random generator to generate the test values, check that they represent a good sample and then use them. This is a good idea especially if choosing by hand would be too time-consuming.
I did use random test values when I wrote a semaphore driver to use for a hw block from two different chips. In this case I couldn't figure out how to choose meaningful values for the timings so I randomized how often the chips would (independently) try to access the block. In retrospect it would still have been better to choose them by hand, because getting the test environment to work in such a way that the two chips didn't align themselves was not as simple as I thought. This was actually a very good example of when random values do not create a random sample.
The problem was caused by the fact that whenever the other chip had reserved the block the other waited and true to a semaphore got access right after the other released it. When I plotted how long the chips had to wait for access the values were in fact far from random. Worst was when I had the same value range for both random values, it got slightly better after I changed them to have different ranges, but it still wasn't very random. I started getting something of a random test only after I randomized both the waiting times between accesses and how long the block was reserved and chose the four sets carefully.
In the end I probably ended up using more time writing the code to use "random" values than I would have used to pick meaningful values by hand in the first place.
See David Saff's work on Theory-Based Testing.
Generally I'd avoid randomness in unit tests, but the theory stuff is intriguing.
The 'key' point here is unit test. A slew of random values in the expected ranged as well as edges for the good case and ouside range/boundary for bad case is valuable in a regression test, provided the seed is constant.
a unit test may use random values in the expected range, if it is possible to always save the inputs/outputs (if any) before and after.

Random data in Unit Tests?

I have a coworker who writes unit tests for objects which fill their fields with random data. His reason is that it gives a wider range of testing, since it will test a lot of different values, whereas a normal test only uses a single static value.
I've given him a number of different reasons against this, the main ones being:
random values means the test isn't truly repeatable (which also means that if the test can randomly fail, it can do so on the build server and break the build)
if it's a random value and the test fails, we need to a) fix the object and b) force ourselves to test for that value every time, so we know it works, but since it's random we don't know what the value was
Another coworker added:
If I am testing an exception, random values will not ensure that the test ends up in the expected state
random data is used for flushing out a system and load testing, not for unit tests
Can anyone else add additional reasons I can give him to get him to stop doing this?
(Or alternately, is this an acceptable method of writing unit tests, and I and my other coworker are wrong?)
There's a compromise. Your coworker is actually onto something, but I think he's doing it wrong. I'm not sure that totally random testing is very useful, but it's certainly not invalid.
A program (or unit) specification is a hypothesis that there exists some program that meets it. The program itself is then evidence of that hypothesis. What unit testing ought to be is an attempt to provide counter-evidence to refute that the program works according to the spec.
Now, you can write the unit tests by hand, but it really is a mechanical task. It can be automated. All you have to do is write the spec, and a machine can generate lots and lots of unit tests that try to break your code.
I don't know what language you're using, but see here:
Java
http://functionaljava.org/
Scala (or Java)
http://github.com/rickynils/scalacheck
Haskell
http://www.cs.chalmers.se/~rjmh/QuickCheck/
.NET:
http://blogs.msdn.com/dsyme/archive/2008/08/09/fscheck-0-2.aspx
These tools will take your well-formed spec as input and automatically generate as many unit tests as you want, with automatically generated data. They use "shrinking" strategies (which you can tweak) to find the simplest possible test case to break your code and to make sure it covers the edge cases well.
Happy testing!
This kind of testing is called a Monkey test. When done right, it can smoke out bugs from the really dark corners.
To address your concerns about reproducibility: the right way to approach this, is to record the failed test entries, generate a unit test, which probes for the entire family of the specific bug; and include in the unit test the one specific input (from the random data) which caused the initial failure.
There is a half-way house here which has some use, which is to seed your PRNG with a constant. That allows you to generate 'random' data which is repeatable.
Personally I do think there are places where (constant) random data is useful in testing - after you think you've done all your carefully-thought-out corners, using stimuli from a PRNG can sometimes find other things.
I am in favor of random tests, and I write them. However, whether they are appropriate in a particular build environment and which test suites they should be included in is a more nuanced question.
Run locally (e.g., overnight on your dev box) randomized tests have found bugs both obvious and obscure. The obscure ones are arcane enough that I think random testing was really the only realistic one to flush them out. As a test, I took one tough-to-find bug discovered via randomized testing and had a half dozen crack developers review the function (about a dozen lines of code) where it occurred. None were able to detect it.
Many of your arguments against randomized data are flavors of "the test isn't reproducible". However, a well written randomized test will capture the seed used to start the randomized seed and output it on failure. In addition to allowing you to repeat the test by hand, this allows you to trivially create new test which test the specific issue by hardcoding the seed for that test. Of course, it's probably nicer to hand-code an explicit test covering that case, but laziness has its virtues, and this even allows you to essentially auto-generate new test cases from a failing seed.
The one point you make that I can't debate, however, is that it breaks the build systems. Most build and continuous integration tests expect the tests to do the same thing, every time. So a test that randomly fails will create chaos, randomly failing and pointing the fingers at changes that were harmless.
A solution then, is to still run your randomized tests as part of the build and CI tests, but run it with a fixed seed, for a fixed number of iterations. Hence the test always does the same thing, but still explores a bunch of the input space (if you run it for multiple iterations).
Locally, e.g., when changing the concerned class, you are free to run it for more iterations or with other seeds. If randomized testing ever becomes more popular, you could even imagine a specific suite of tests which are known to be random, which could be run with different seeds (hence with increasing coverage over time), and where failures wouldn't mean the same thing as deterministic CI systems (i.e., runs aren't associated 1:1 with code changes and so you don't point a finger at a particular change when things fail).
There is a lot to be said for randomized tests, especially well written ones, so don't be too quick to dismiss them!
If you are doing TDD then I would argue that random data is an excellent approach. If your test is written with constants, then you can only guarantee your code works for the specific value. If your test is randomly failing the build server there is likely a problem with how the test was written.
Random data will help ensure any future refactoring will not rely on a magic constant. After all, if your tests are your documentation, then doesn't the presence of constants imply it only needs to work for those constants?
I am exaggerating however I prefer to inject random data into my test as a sign that "the value of this variable should not affect the outcome of this test".
I will say though that if you use a random variable then fork your test based on that variable, then that is a smell.
In the book Beautiful Code, there is a chapter called "Beautiful Tests", where he goes through a testing strategy for the Binary Search algorithm. One paragraph is called "Random Acts of Testing", in which he creates random arrays to thoroughly test the algorithm. You can read some of this online at Google Books, page 95, but it's a great book worth having.
So basically this just shows that generating random data for testing is a viable option.
Your co-worker is doing fuzz-testing, although he doesn't know about it. They are especially valuable in server systems.
One advantage for someone looking at the tests is that arbitrary data is clearly not important. I've seen too many tests that involved dozens of pieces of data and it can be difficult to tell what needs to be that way and what just happens to be that way. E.g. If an address validation method is tested with a specific zip code and all other data is random then you can be pretty sure the zip code is the only important part.
if it's a random value and the test fails, we need to a) fix the object and b) force ourselves to test for that value every time, so we know it works, but since it's random we don't know what the value was
If your test case does not accurately record what it is testing, perhaps you need to recode the test case. I always want to have logs that I can refer back to for test cases so that I know exactly what caused it to fail whether using static or random data.
You should ask yourselves what is the goal of your test.
Unit tests are about verifying logic, code flow and object interactions. Using random values tries to achieve a different goal, thus reduces test focus and simplicity. It is acceptable for readability reasons (generating UUID, ids, keys,etc.).
Specifically for unit tests, I cannot recall even once this method was successful finding problems. But I have seen many determinism problems (in the tests) trying to be clever with random values and mainly with random dates.
Fuzz testing is a valid approach for integration tests and end-to-end tests.
Can you generate some random data once (I mean exactly once, not once per test run), then use it in all tests thereafter?
I can definitely see the value in creating random data to test those cases that you haven't thought of, but you're right, having unit tests that can randomly pass or fail is a bad thing.
If you're using random input for your tests you need to log the inputs so you can see what the values are. This way if there is some edge case you come across, you can write the test to reproduce it. I've heard the same reasons from people for not using random input, but once you have insight into the actual values used for a particular test run then it isn't as much of an issue.
The notion of "arbitrary" data is also very useful as a way of signifying something that is not important. We have some acceptance tests that come to mind where there is a lot of noise data that is no relevance to the test at hand.
I think the problem here is that the purpose of unit tests is not catching bugs. The purpose is being able to change the code without breaking it, so how are you going to know that you break
your code when your random unit tests are green in your pipeline, just because they doesn't touch the right path?
Doing this is insane for me. A different situation could be running them as integration tests or e2e not as a part of the build, and just for some specific things because in some situations you will need a mirror of your code in your asserts to test that way.
And having a test suite as complex as your real code is like not having tests at all because who
is going to test your suite then? :p
A unit test is there to ensure the correct behaviour in response to particular inputs, in particular all code paths/logic should be covered. There is no need to use random data to achieve this. If you don't have 100% code coverage with your unit tests, then fuzz testing by the back door is not going to achieve this, and it may even mean you occasionally don't achieve your desired code coverage. It may (pardon the pun) give you a 'fuzzy' feeling that you're getting to more code paths, but there may not be much science behind this. People often check code coverage when they run their unit tests for the first time and then forget about it (unless enforced by CI), so do you really want to be checking coverage against every run as a result of using random input data? It's just another thing to potentially neglect.
Also, programmers tend to take the easy path, and they make mistakes. They make just as many mistakes in unit tests as they do in the code under test. It's way too easy for someone to introduce random data, and then tailor the asserts to the output order in a single run. Admit it, we've all done this. When the data changes the order can change and the asserts fail, so a portion of the executions fail. This portion needn't be 1/2 I've seen exactly this result in failures 10% of the time. It takes a long time to track down problems like this, and if your CI doesn't record enough data about enough of the runs, then it can be even worse.
Whilst there's an argument for saying "just don't make these mistakes", in a typical commercial programming setup there'll be a mix of abilities, sometimes relatively junior people reviewing code for other junior people. You can write literally dozens more tests in the time it takes to debug one non-deterministic test and fix it, so make sure you don't have any. Don't use random data.
In my experience unit tests and randomized tests should be separated. Unit tests serve to give a certainty of the correctness of some cases, not only to catch obscure bugs.
All that said, randomized testing is useful and should be done, separately from unit tests, but it should test a series of randomized values.
I can't help to think that testing 1 random value with every run is just not enough, neither to be a sufficient randomized test, neither to be a truly useful unit test.
Another aspect is validating the test results. If you have random inputs, you have to calculate the expected output for it inside the test. This will at some level duplicate the tested logic, making the test only a mirror of the tested code itself. This will not sufficiently test the code, since the test might contain the same errors the original code does.
This is an old question, but I wanted to mention a library I created that generates objects filled with random data. It supports reproducing the same data if a test fails by supplying a seed. It also supports JUnit 5 via an extension.
Example usage:
Person person = Instancio.create(Person.class);
Or a builder API for customising generation parameters:
Person person = Instancio.of(Person.class)
.generate(field("age"), gen -> gen.ints.min(18).max(65))
.create();
Github link has more examples: https://github.com/instancio/instancio
You can find the library on maven central:
<dependency>
<groupId>org.instancio</groupId>
<artifactId>instancio-junit</artifactId>
<version>LATEST</version>
</dependency>
Depending on your object/app, random data would have a place in load testing. I think more important would be to use data that explicitly tests the boundary conditions of the data.
We just ran into this today. I wanted pseudo-random (so it would look like compressed audio data in terms of size). I TODO'd that I also wanted deterministic. rand() was different on OSX than on Linux. And unless I re-seeded, it could change at any time. So we changed it to be deterministic but still psuedo-random: the test is repeatable, as much as using canned data (but more conveniently written).
This was NOT testing by some random brute force through code paths. That's the difference: still deterministic, still repeatable, still using data that looks like real input to run a set of interesting checks on edge cases in complex logic. Still unit tests.
Does that still qualify is random? Let's talk over beer. :-)
I can envisage three solutions to the test data problem:
Test with fixed data
Test with random data
Generate random data once, then use it as your fixed data
I would recommend doing all of the above. That is, write repeatable unit tests with both some edge cases worked out using your brain, and some randomised data which you generate only once. Then write a set of randomised tests that you run as well.
The randomised tests should never be expected to catch something your repeatable tests miss. You should aim to cover everything with repeatable tests, and consider the randomised tests a bonus. If they find something, it should be something that you couldn't have reasonably predicted; a real oddball.
How can your guy run the test again when it has failed to see if he has fixed it? I.e. he loses repeatability of tests.
While I think there is probably some value in flinging a load of random data at tests, as mentioned in other replies it falls more under the heading of load testing than anything else. It is pretty much a "testing-by-hope" practice. I think that, in reality, your guy is simply not thinkng about what he is trying to test, and making up for that lack of thought by hoping randomness will eventually trap some mysterious error.
So the argument I would use with him is that he is being lazy. Or, to put it another way, if he doesn't take the time to understand what he is trying to test it probably shows he doesn't really understand the code he is writing.