I'm just getting into unit testing, and have written some short tests to check if function called isPrime() works correctly.
I've got a test that checks that the function works, and have some test data in the form of some numbers and the expected return value.
How many should I test? How do I decide on which to test? What's the best-practices here?
One approach would be to generate 1000 primes, then loop through them all, another would be to just select 4 or 5 and test them. What's the correct thing to do?
I've also been informed that every time a bug is found, you should write a test to verify that it is fixed. It seems reasonable to me, anyway.
You'd want to check edge cases. How big a prime number is your method supposed to be able to handle? This will depend on what representation (type) you used. If you're only interested in small (really relative term when used in number theory) primes, you're probably using int or long. Test a handful of the biggest primes you can in the representation you've chosen. Make sure you check some non-prime numbers too. (These are much easier to verify independently.)
Naturally, you'll also want to test a few small numbers (primes and non-primes) and a few in the middle of the range. A handful of each should be plenty. Also make sure you throw an exception (or return an error code, whichever is your preference) for numbers that are out of range of your valid inputs.
in general, test as many cases as you need to feel comfortable/confident
also in general, test the base/zero case, the maximum case, and at least one median/middle case
also test expected-exception cases, if applicable
if you're unsure of your prime algorithm, then by all means test it with the first 1000 primes or so, to gain confidence
"Beware of bugs. I have proven the above algorithm correct, but have not tested it yet."
Some people don't understand the above quote (paraphrase?), but it makes perfect sense when you think about it. Tests will never prove an algorithm correct, they only help to indicate whether you've coded it right. Write tests for mistakes you expect might appear and for boundary conditions to achieve good code coverage. Don't just be picking values out of the blue to see if they work, because that might lead to lots of tests which all test exactly the same thing.
For your example, just hand-select a few primes and non-primes to test specific conditions in the implementation.
Ask yourself: what EXACTLY do I want to test, and test the most important things. Test to make sure it basically does what you are expecting it to do in the expected cases.
Testing all those nulls and edge-cases - I - don't think is real, too time consuming and someone needs to maintain that later!
And...your test code should be simple enough so that you do not need to test your test code!
If you want to check that your function correctly applied the algorithm and works in general - probably will be enough some primes.
If you want prove that the method for finding primes is CORRECT - 100000 primes will not be enough. But you don't want to test the latter (probably).
Only you know what you want to test!
PS I think using loops in unit tests is not always wrong but I would think twice before doing that. Test code should be VERY simple. What if something goes wrong and there is a bug in your test? However, you should try to avoid test code duplication as regular code duplication. Someone has to maintain test code.
A few questions, the answers may inform your decision:
How important is the correct functioning of this code?
Is the implementation of this code likely to be changed in the future? (if so, test more to support the future change)
Is the public contract of this code likely to change in the future? (if so, test less - to reduce the amount of throw-away test code)
How is the code coverage, are all branches visited?
Even if the code doesn't branch, are boundary considerations tested?
Do the tests run quickly?
Edit: Hmm, so to advise in your specific scenario. Since you started writing unit tests yesterday, you might not have the experience to decide among all these factors. Let me help you:
This code is probably not too important (no one dies, no one goes to war, no one is sued), so a smattering of tests will be fine.
The implementation probably won't change (prime number techniques are well known), so we don't need tests to support this. If the implementation does change, it's probably due to an observed failing value. That can be added as a new test at the time of change.
The public contract of this won't change.
Get 100% code coverage on this. There's no reason to write code that a test doesn't visit in this circumstance. You should be able to do this with a small number of tests.
If you care what the code does when zero is called, test that.
The small number of tests should run quickly. This will allow them to be run frequently (both by developers and by automation).
I would test 1, 2, 3, 21, 23, a "large" prime (5 digits), a "large" non-prime and 0 if you care what this does with 0.
To be really sure, you're going to have to test them all. :-)
Seriously though, for this kind of function you're probably using an established and proven algorithm. The main thing you need to do is verify that your code correctly implements the algorithm.
The other thing is to make sure you understand the limits of your number representation, whatever that is. At the very least, this will put an upper limit on the size the number you can test. E.g., if you use a 32-bit unsigned int, you're never going to be able to test values larger than 4G. Possibly your limit will be lower than that, depending on the details of your implementation.
Just as an example of something that could go wrong with an implementation:
A simple algorithm for testing primes is to try dividing the candidate by all known primes up to the square root of the candidate. The square root function will not necessarily give an exact result, so to be safe you should go a bit past that. How far past would depend on specifically how the square root function is implemented and how much it could be off.
Another note on testing: In addition to testing known primes to see if your function correctly identifies them as prime, also test known composite numbers to make sure you're not getting "false positives." To make sure you get that square root function thing right, pick some composite numbers that have a prime factor as close as possible to their square root.
Also, consider how you're going to "generate" your list of primes for testing. Can you trust that list to be correct? How were those numbers tested, and by whom?
You might consider coding two functions and testing them against each other. One could be a simple but slow algorithm that you can be more sure of having coded correctly, and the other a faster but more complex one that you really want to use in your app, but is more likely to have a coding mistake.
"If it's worth building, it's worth testing"
"If it's not worth testing, why are you wasting your time working on it?"
I'm guessing you didn't subscribe to test first, where you write the test before you write the code?
Not to worry, I'm sure you are not alone.
As has been said above, test the edges, great place to start. Also must test bad cases, if you only test what you know works, you can be sure that a bad case will happen in production at the worst time.
Oh and "I've just found the last bug" - HA HA.
Related
Let's say I have a function
function (int x) {
if (x < 10) return true;
return false;
}
Ideally, you want to write 2^32 - 1 test cases to cover from INT_MIN to INT_MAX? Of course this is not practical.
To make life easier, we write test cases for
x < 10, test x = 9 expect true
x == 10, test x = 10 expect false
x > 10, test x = 11 expect false
These test cases are fine but it does not cover every case. Let's say one day someone modified the function to be
function (int x) {
if (x == 12) return true;
if (x < 10) return true;
return false;
}
he will run the test and realize all the test passed. How do we make sure we cover every senario without going to extreme. Is there a key word for this issue I am describing?
This is partly a comment partly an answer because of the way you phrased the question.
The comment
Is it possible to write a unit test that cover everything?
No. Even in your example you limit the test cases to 2^32 but what if the code is moved to a 64 bit system and then someone adds a line using 2^34 or something.
Also your question indicates to me that you are thinking of static test cases with dynamic code, e.g. the code is dynamic in that it is changed over time by a programmer, this does not mean dynamically modified by the code. You should be thinking dynamic test cases with dynamic code.
Lastly you did not note if it was white, gray or black box testing.
The answer
Let a tool analyze the code and generate the tests data.
See: A Survey on Automatic Test Data Generation
Also you asked about key words for searching.
Here is a Google search for this that I found of value:
code analysis automated test generation survey
Related
I have never used one of these test case tools myself as I use Prolog DCG to generate my test cases and currently with a project I am doing generate millions of test cases in about two minutes and test them over a few minutes. Some of the test cases that fail I would never have thought up on my own so this may be considered overkill by some, but it works.
Since many people don't know Prolog DCGs here is a similar way explained using C# with LINQ by Eric Lippert, Every Binary Tree There Is
No, there's not currently a general algorithm for this that doesn't involve some kind of very intensive computation (e.g. testing lots and lots of cases), but you can write your unit tests in such a way that they'll have a higher probability of failing in the case of a change to the method. For example, in the answer given, write a test for x = 10. For the other two cases, first pick a couple of random numbers between 11 and int.Max and test those. Then test a couple of random numbers between int.Min and 9. The test wouldn't necessarily fail after the modification you describe, but there's a better chance that it would fail than if you had just hardcoded the value.
Also, as #GuyCoder pointed out in his excellent answer, even if you did try to do something like that, it's remarkably difficult (or impossible) to prove that there are no possible changes to a method that would break your test.
Also, keep in mind that no kind of test automation (including unit testing) is a foolproof method of testing; even under ideal conditions, you generally can't 100% prove that your program is correct. Keep in mind that virtually all software testing approaches are fundamentally empirical methods and empirical methods can't really achieve 100% certainty. (They can achieve a good deal of certainty, though; in fact, many scientific papers achieve 95% certainty or higher - sometimes much higher - so in cases like that the difference may not be all that important). For example, even if you have 100% code coverage, how do you know that there's not an error in the tests somewhere? Are you going to write tests for the tests? (This can lead to a turtles all the way down type situation).
If you want to get really literal about it and you buy into David Hume, you really can't ever be 100% sure about something based on empirical testing; the fact that a test has passed every time you've run it doesn't mean that it'll continue to pass in the future. I digress, though.
If you're interested, formal verification studies methods of deductively proving that the software (or, at least, certain aspects of the software) are correct. Note that the major issue with that is that it tends to be very difficult or impossible to achieve formal verification of a program of a complete system of any complexity, though, especially if you're using third-party libraries that aren't formally verified. (Those, along with the difficulty of learning the techniques in the first place, are some of the main reasons that formal verification hasn't really taken off outside of academia and certain very narrow industry applications).
A final point: software ships with bugs. You'd be hard-pressed to find any complicated system that was 100% defect-free at the time that it was released. As I mentioned above, there is no currently-known technique to guarantee that your testing found all of the bugs (and if you can find one you'll become a very wealthy individual), so for the most part you'll have to rely on statistical measures to know whether you've tested adequately.
TL;DR No, you can't, and even if you could you still couldn't be 100% sure that your software was correct (there might be a bug in your tests, for example). For the foreseeable future, your unit test cases will need maintenance too. You can write the tests to be more resilient against changes, though.
I have never used unit testing before, so I'm giving CxxTest a go. I wrote a test to check if a function correctly sorts a std::vector. First I made sure the test failed when the vector wasn't sorted, and then as a sanity check I tested whether std::sort worked (it did, of course). So far, so good.
Then I started writing my own sorting function. However, I made a mistake and the function didn't sort correctly. Since my test didn't output the intermediate states of a vector as it was being sorted, it was difficult to tell where I had gone wrong in my sorting function. I ended up using cout statements (I could have used a debugger) to find my bug, and never used the unit test until after I knew my sort function worked.
Am I doing something wrong here? I thought unit testing was as simple as
1) Write test
2) Write function
3) Test function
4) If test fails, revise function
5) Repeat 3 and 4 until test passes
The process I used was more like
1) Write test
2) Write function
3) Test function
4) If test fails, debug function until it works correctly
5) Repeat 3 (even though function is already known to work)
I feel like my process was not truly TDD, because the design of my sorting function was not driven by the test I wrote. Should I have written more tests, e.g. tests that check the intermediate states of a vector as it's being sorted?
Tests are not supposed to debug your code for you.
I feel like my process was not truly TDD
Your wrote a test. It found a bug. You fixed the bug. Your test passed. The system works!
This is the essence of test-driven development. Your tests tell you when you have a bug, and they tell you when you're done.
Anyway, feeling guilt because you're not achieving pure TDD or pure OOP or whatever is a psychological disorder. Go forth and be productive.
Don't try to test all intermediate states. No one cares how your sort algorithm works, just that it does its job reliably and quickly.
Instead, write your tests to check for sorted-ness for many different data sets. Test all typical problem sets: already sorted data, reverse-sorted data, random data, etc.
If your application requires a stable sort, your checking will have to be more careful. You might add a unique tag to each item being sorted just for testing purposes which the sort's comparison function doesn't test when sorting, but which can be used to ensure that two otherwise-equal values end up in the same relative order in the final output.
A final bit of advice: in testing, do make some effort to think of all possible failure cases up front, but don't expect to succeed. Be prepared to add tests as you discover more edge cases later. Test suites should evolve toward correctness, not be expected to be perfect up front, unless there is a mathematical reason why they should be correct.
Unit testing focuses on the specific external behaviours of the thing you are writing, it can't really understand the intermediate states of algorithm. A sorting function is a rather special case.
More usually we are dealing with business logic of the kind
"The Order price is the sum of the prices of the order items reduced by a discount of 10% if the total value is greater than £20 and by a further 5% if the customer is a gold member"
We immediately can write tests such as
No order items
One order item value £20.00
One order item value £20.01
Two order items total value £20.00 gold customer
...
and so on - now it should be clear that these tests apply to different branches of the code and do help get it right.
For your sort code it may be helpful to have tests such as
{ 0 }
{ 1, 2 }
{ 2, 1 }
{ 1, 1 }
and so on, but the tests don't really know whether you are doing QuickSort or BubbleSort or whatever.
The TDD process is
RED: write a test, verify it fails
GREEN write just enough code to make it pass, verify it passes
refactor the code - both the test and the production code
If you find yourself having to resort to the debugger at step 2, it could be that you're testing too much at a time. Divide and conquer. Although dividing could not be so easy for a sorting algorithm, did you start with sorting an empty vector, then a vector with a single element, then a vector with two elements already ordered, a vector with two elements in the wrong order ...
I don't see a fundamental difference between the #4s in the sequences above. In TDD, you write unit tests so that, if they pass, you're fairly sure the code works. You work on the code until it passes. If you find a bug, you write another test to find it, and you are through working on the code when the tests pass. (If you're still not confident in it, write more tests.) In your case, you had more difficulty than you expected in getting the code to meet the test.
The advantage is not so much in getting code units to work as in knowing that they still work when you change things, and in having a clear definition of when they do work. (There are other advantages: for example, the tests serve as documentation of what the code is supposed to be doing.)
It may be that you want to write some smaller tests, but I'm not sure what you'd write that would be useful in the middle of a sort function. It seems to me that they'd be heavily dependent on how the function is implemented, and that seems to me against the spirit of TDD.
By the way, why are you writing your own sort function? I hope this is because you wanted to write a sort function (for class, or for fun, or to learn), and not for any production reason. The standard functionality is almost certainly going to be more reliable, easier to understand, and usually faster than anything you're going to write, and you shouldn't replace it with your own code without good reason.
In unit tests, I have become used to test methods applying some regular values, some values offending the method contract, and all border-cases I can come up with.
But is it very bad-practice to
test on random values, this is a value within a range you think should never give any trouble, so that each time the test runs, another value is passed in? As a kind of extensive testing of regular values?
test on whole ranges, using iteration ?
I have a feeling both of this approaches aren't any good. With range-testing I can imagine that it's just not practical to do that, since the time it is taking, but with randomness?
UPDATE :
I'm not using this technique myself, was just wondering about it. Randomness can be a good tool, I know now, if you can make it reproduceable when you need to.
The most interesting reply was the 'fuzzing' tip from Lieven :
http://en.wikipedia.org/wiki/Fuzz_testing
tx
Unit tests need to be fast. if they aren't people won't run them regularly. At times I did code for checking the whole range but #Ignore'd commented it out in the end because it made the tests too slow. If I were to use random values, I would go for a PRNG with fixed seeds so that every run actually checks the same numbers.
Random Input - The tests would not be repeatable (produce consistent results every time they are run and hence are not considered good unit tests. Tests should not change their mind.
Range tests / RowTests - are good as long as they dont slow down the test suite run.. each test should run as fast as possible. (A done-in-30sec test suite gets run more often than a 10 min one) - preferably 100ms or less. That said Each input (test data) should be 'representative' input. If all input values are the same, testing each one isn't adding any value and is just routine number crunching. You just need one representative from that set of values. You also need representatives for boundary conditions and 'special' values.
For more on guidelines or thumbrules - see 'What makes a Good Unit Test?'
That said... the techniques you mentioned could be great to find representative inputs.. So use them to find scenarioX where code fails or succeeds incorrectly - then write up a repeatable,quick,tests-one-thing-only unit test for that scenarioX and add it to your test suite. If you find that these tools continue to help you find more good test-cases.. persist with them.
Response to OP's clarification:
If you use the same seed value (test input) for your random no generator on each test run, your test is not random - values can be predetermined. However a unit test ideally shouldn't need any input/output - that is why xUnit test cases have the void TC() signature.
If you use different seed values on each run, now your tests are random and not repeatable. Of course you can hunt down the special seed value in your log files to know what failed (and reproduce the error) but I like my tests to instantly let me know what failed - e.g. a Red TestConversionForEnums() lets me know that the Enum Conversion code is broken without any inspection.
Repeatable - implies that each time the test is run on the SUT, it produces the same result (pass/fail).. not 'Can I reproduce test failure again?' (Repeatable != Reproducible). To reiterate.. this kind of exploratory testing may be good to identify more test cases but I wouldn't add this to my test suite that I run each time I make a code change during the day. I'd recommend doing exploratory testing manually, find some good (some may use sadistic) Testers that'll go hammer and tongs at your code.. will find you more test cases than a random input generator.
I have been using randomness in my testcases. It found me some errors in the SUT and it gave me some errors in my testcase.
Note that the testcase get more complex by using randomnes.
You'll need a method to run your testcase with the random value(s) it failed on
You'll need to log the random values used for every test.
...
All in all, I'm throthling back on using randomness but not dismissing it enterly. As with every technique, it has its values.
For a better explanation of what you are after, look up the term fuzzing
What you describe is usually called specification-based testing and has been implemented by frameworks such as QuickCheck (Haskell), scalacheck (Scala) and Quviq QuickCheck (Erlang).
Data-based testing tools (such as DataProvider in TestNG) can achieve similar results.
The underlying principle is to generate input data for the subject under test based upon some sort of specification and is far from "bad practice".
What are you testing? The random number generator? Or your code?
If your code, what if there is a bug in the code that produces random numbers?
What if you need to reproduce a problem, do you keep restarting the test hoping that it will eventually use the same sequence as you had when you discovered the problem?
If you decide to use a random number generator to produce data, at least seed it with a known constant value, so it's easy to reproduce.
In other words, your "random numbers" are just a "sequence of numbers I really don't care all that much about".
So long as it will tell you in some way what random value it failed on I don't suppose it's that bad. However, you're almost relying on luck to find a problem in your application then.
Testing the whole range will ensure you have every avenue covered but it seems like overkill when you have the edges covered and, I assume, a few middle-ground accepted values.
The goal of unit-testing is to get confidence in your code. Therefore, if you feel that using random values could help you find some more bugs, you obviously need more tests to increase your confidence level.
In that situation, you could rely on iteration-based testing to identify those problems.
I'd recommend creating new specific tests for the cases discovered with the loop testing, and removing the iteration-based tests then; so that they don't slow down your tests.
I have used randomness for debugging a field problem with a state machine leaking a resource. We code inspected, ran the unit-tests and couldn't reproduce the leak.
We fed random events from the entire possible event space into the state machine unit test environment. We looked at the invariants after each event and stopped when they were violated.
The random events eventually exposed a sequence of events that produced a leak.
The state machine leaked a resource when a 2nd error occurred while recovering from a first error.
We were then able to reproduce the leak in the field.
So randomness found a problem that was difficult to find otherwise. A little brute force but the computer didn't mind working the weekend.
I wouldn't advocate completely random values as it will give you a false sense of security. If you can't go through the entire range (which is often the case) it's far more efficient to select a subset by hand. This way you will also have to think of possible "odd" values, values that causes the code to run differently (and are not near edges).
You could use a random generator to generate the test values, check that they represent a good sample and then use them. This is a good idea especially if choosing by hand would be too time-consuming.
I did use random test values when I wrote a semaphore driver to use for a hw block from two different chips. In this case I couldn't figure out how to choose meaningful values for the timings so I randomized how often the chips would (independently) try to access the block. In retrospect it would still have been better to choose them by hand, because getting the test environment to work in such a way that the two chips didn't align themselves was not as simple as I thought. This was actually a very good example of when random values do not create a random sample.
The problem was caused by the fact that whenever the other chip had reserved the block the other waited and true to a semaphore got access right after the other released it. When I plotted how long the chips had to wait for access the values were in fact far from random. Worst was when I had the same value range for both random values, it got slightly better after I changed them to have different ranges, but it still wasn't very random. I started getting something of a random test only after I randomized both the waiting times between accesses and how long the block was reserved and chose the four sets carefully.
In the end I probably ended up using more time writing the code to use "random" values than I would have used to pick meaningful values by hand in the first place.
See David Saff's work on Theory-Based Testing.
Generally I'd avoid randomness in unit tests, but the theory stuff is intriguing.
The 'key' point here is unit test. A slew of random values in the expected ranged as well as edges for the good case and ouside range/boundary for bad case is valuable in a regression test, provided the seed is constant.
a unit test may use random values in the expected range, if it is possible to always save the inputs/outputs (if any) before and after.
NOTE: I mention the next couple of paragraphs as background. If you just want a TL;DR, feel free to skip down to the numbered questions as they are only indirectly related to this info.
I'm currently writing a python script that does some stuff with POSIX dates (among other things). Unit testing these seems a little bit difficult though, since there's such a wide range of dates and times that can be encountered.
Of course, it's impractical for me to try to test every single date/time combination possible, so I think I'm going to try a unit test that randomizes the inputs and then reports what the inputs were if the test failed. Statisically speaking, I figure that I can achieve a bit more completeness of testing than I could if I tried to think of all potential problem areas (due to missing things) or testing all cases (due to sheer infeasability), assuming that I run it enough times.
So here are a few questions (mainly indirectly related to the above ):
What types of code are good candidates for randomized testing? What types of code aren't?
How do I go about determining the number of times to run the code with randomized inputs? I ask this because I want to have a large enough sample to determine any bugs, but don't want to wait a week to get my results.
Are these kinds of tests well suited for unit tests, or is there another kind of test that it works well with?
Are there any other best practices for doing this kind of thing?
Related topics:
Random data in unit tests?
I agree with Federico - randomised testing is counterproductive. If a test won't reliably pass or fail, it's very hard to fix it and know it's fixed. (This is also a problem when you introduce an unreliable dependency, of course.)
Instead, however, you might like to make sure you've got good data coverage in other ways. For instance:
Make sure you have tests for the start, middle and end of every month of every year between 1900 and 2100 (if those are suitable for your code, of course).
Use a variety of cultures, or "all of them" if that's known.
Try "day 0" and "one day after the end of each month" etc.
In short, still try a lot of values, but do so programmatically and repeatably. You don't need every value you try to be a literal in a test - it's fine to loop round all known values for one axis of your testing, etc.
You'll never get complete coverage, but it will at least be repeatable.
EDIT: I'm sure there are places where random tests are useful, although probably not for unit tests. However, in this case I'd like to suggest something: use one RNG to create a random but known seed, and then seed a new RNG with that value - and log it. That way if something interesting happens you will be able to reproduce it by starting an RNG with the logged seed.
With respect to the 3rd question, in my opinion random tests are not well suited for unit testing. If applied to the same piece of code, a unit test should succeed always, or fail always (i.e., wrong behavior due to bugs should be reproducible). You could however use random techniques to generate a large data set, then use that data set within your unit tests; there's nothing wrong with it.
Wow, great question! Some thoughts:
Random testing is always a good confidence building activity, though as you mentioned, it's best suited to certain types of code.
It's an excellent way to stress-test any code whose performance may be related to the number of times it's been executed, or to the sequence of inputs.
For fairly simple code, or code that expects a limited type of input, I'd prefer systematic test that explicitly cover all of the likely cases, samples of each unlikely or pathological case, and all the boundary conditions.
Q1) I found that distributed systems with lots of concurrency are good candidates for randomized testing. It is hard to create all possible scenarios for such applications, but random testing can expose problems that you never thought about.
Q2) I guess you could try to use statistics to build an confidence interval around having discovered all "bugs". But the practical answer is: run your randomized tests as many times as you can afford.
Q3) I have found that randomized testing is useful but after you have written the normal battery of unit, integration and regression tests. You should integrate your randomized tests as part of the normal test suite, though probably a small run. If nothing else, you avoid bit rot in the tests themselves, and get some modicum coverage as the team runs the tests with different random inputs.
Q4) When writing randomized tests, make sure you save the random seed with the results of the tests. There is nothing more frustrating than finding that your random tests caught a bug, and not being able to run the test again with the same input. Make sure your test can either be executed with the saved seed too.
A few things:
With random testing, you can't really tell how good a piece of code is, but you can tell how bad it is.
Random testing is better suited for things that have random inputs -- a prime example is anything that's exposed to users. So, for example, something that randomly clicks & types all over your app (or OS) is a good test of general robustness.
Similarly, developers count as users. So something that randomly assembles a GUI from your framework is another good candidate.
Again, you're not going to find all the bugs this way -- what you're looking for is "if I do a million whacky things, do ANY of them result in system corruption?" If not, you can feel some level of confidence that your app/OS/SDK/whatever might hold up to a few days' exposure to users.
...But, more importantly, if your random-beater-upper test app can crash your app/OS/SDK in about 5 minutes, that's about how long you'll have until the first fire-drill if you try to ship that sucker.
Also note: REPRODUCIBILITY IS IMPORTANT IN TESTING! Hence, have your test-tool log the random-seed that it used, and have a parameter to start with the same seed. In addition, have it either start from a known "base state" (i.e., reinstall everything from an image on a server & start there) or some recreatable base-state (i.e., reinstall from that image, then alter it according to some random-seed that the test tool takes as a parameter.)
Of course, the developers will appreciate if the tool has nice things like "save state every 20,000 events" and "stop right before event #" and "step forward 1/10/100 events." This will greatly aid them in reproducing the problem, finding and fixing it.
As someone else pointed out, servers are another thing exposed to users. Get yourself a list of 1,000,000 URLs (grep from server logs), then feed them to your random number generator.
And remember: "system went 24 hours of random pounding without errors" does not mean it's ready to ship, it just means it's stable enough to start some serious testing. Before it can do that, QA should feel free to say "look, your POS can't even last 24 hours under life-like random user simulation -- you fix that, I'm going to spend some time writing better tools."
Oh yeah, one last thing: in addition to the "pound it as fast & hard as you can" tests, have the ability to do "exactly what a real user [who was perhaps deranged, or a baby bounding the keyboard/mouse] would do." That is, if you're doing random user-events; do them at the speed that a very-fast typist or very-fast mouse-user could do (with occasional delays, to simulate a SLOW person), in addition to "as fast as my program can spit-out events." These are two **very different* types of tests, and will get very different reactions when bugs are found.
To make tests reproducible, simply use a fixed seed start value. That ensures the same data is used whenever the test runs. Tests will reliably pass or fail.
Good / bad candidates? Randomized tests are good at finding edge cases (exceptions). A problem is to define the correct result of a randomized input.
Determining the number of times to run the code: Simply try it out, if it takes too long reduce the iteration count. You may want to use a code coverage tool to find out what part of your application is actually tested.
Are these kinds of tests well suited for unit tests? Yes.
This might be slightly off-topic, but if you're using .net, there is Pex, which does something similar to randomized testing, but with more intuition by attempting to generate a "random" test case that exercises all of the paths through your code.
Here is my answer to a similar question: Is it a bad practice to randomly-generate test data?. Other answers may be useful as well.
Random testing is a bad practice a
long as you don't have a solution for
the oracle problem, i.e.,
determining which is the expected
outcome of your software given its
input.
If you solved the oracle problem, you
can get one step further than simple
random input generation. You can
choose input distributions such that
specific parts of your software get
exercised more than with simple
random.
You then switch from random testing to
statistical testing.
if (a > 0)
// Do Foo
else (if b < 0)
// Do Bar
else
// Do Foobar
If you select a and b randomly in
int range, you exercise Foo 50% of
the time, Bar 25% of the time and
Foobar 25% of the time. It is likely
that you will find more bugs in Foo
than in Bar or Foobar.
If you select a such that it is
negative 66.66% of the time, Bar and
Foobar get exercised more than with
your first distribution. Indeed the
three branches get exercised each
33.33% of the time.
Of course, if your observed outcome is
different than your expected outcome,
you have to log everything that can be
useful to reproduce the bug.
Random testing has the huge advantage that individual tests can be generated for extremely low cost. This is true even if you only have a partial oracle (for example, does the software crash?)
In a complex system, random testing will find bugs that are difficult to find by any other means. Think about what this means for security testing: even if you don't do random testing, the black hats will, and they will find bugs you missed.
A fascinating subfield of random testing is randomized differential testing, where two or more systems that are supposed to show the same behavior are stimulated with a common input. If their behavior differs, a bug (in one or both) has been found. This has been applied with great effect to testing of compilers, and invariably finds bugs in any compiler that has not been previously confronted with the technique. Even if you have only one compiler you can try it on different optimization settings to look for varying results, and of course crashes always mean bugs.
I have a coworker who writes unit tests for objects which fill their fields with random data. His reason is that it gives a wider range of testing, since it will test a lot of different values, whereas a normal test only uses a single static value.
I've given him a number of different reasons against this, the main ones being:
random values means the test isn't truly repeatable (which also means that if the test can randomly fail, it can do so on the build server and break the build)
if it's a random value and the test fails, we need to a) fix the object and b) force ourselves to test for that value every time, so we know it works, but since it's random we don't know what the value was
Another coworker added:
If I am testing an exception, random values will not ensure that the test ends up in the expected state
random data is used for flushing out a system and load testing, not for unit tests
Can anyone else add additional reasons I can give him to get him to stop doing this?
(Or alternately, is this an acceptable method of writing unit tests, and I and my other coworker are wrong?)
There's a compromise. Your coworker is actually onto something, but I think he's doing it wrong. I'm not sure that totally random testing is very useful, but it's certainly not invalid.
A program (or unit) specification is a hypothesis that there exists some program that meets it. The program itself is then evidence of that hypothesis. What unit testing ought to be is an attempt to provide counter-evidence to refute that the program works according to the spec.
Now, you can write the unit tests by hand, but it really is a mechanical task. It can be automated. All you have to do is write the spec, and a machine can generate lots and lots of unit tests that try to break your code.
I don't know what language you're using, but see here:
Java
http://functionaljava.org/
Scala (or Java)
http://github.com/rickynils/scalacheck
Haskell
http://www.cs.chalmers.se/~rjmh/QuickCheck/
.NET:
http://blogs.msdn.com/dsyme/archive/2008/08/09/fscheck-0-2.aspx
These tools will take your well-formed spec as input and automatically generate as many unit tests as you want, with automatically generated data. They use "shrinking" strategies (which you can tweak) to find the simplest possible test case to break your code and to make sure it covers the edge cases well.
Happy testing!
This kind of testing is called a Monkey test. When done right, it can smoke out bugs from the really dark corners.
To address your concerns about reproducibility: the right way to approach this, is to record the failed test entries, generate a unit test, which probes for the entire family of the specific bug; and include in the unit test the one specific input (from the random data) which caused the initial failure.
There is a half-way house here which has some use, which is to seed your PRNG with a constant. That allows you to generate 'random' data which is repeatable.
Personally I do think there are places where (constant) random data is useful in testing - after you think you've done all your carefully-thought-out corners, using stimuli from a PRNG can sometimes find other things.
I am in favor of random tests, and I write them. However, whether they are appropriate in a particular build environment and which test suites they should be included in is a more nuanced question.
Run locally (e.g., overnight on your dev box) randomized tests have found bugs both obvious and obscure. The obscure ones are arcane enough that I think random testing was really the only realistic one to flush them out. As a test, I took one tough-to-find bug discovered via randomized testing and had a half dozen crack developers review the function (about a dozen lines of code) where it occurred. None were able to detect it.
Many of your arguments against randomized data are flavors of "the test isn't reproducible". However, a well written randomized test will capture the seed used to start the randomized seed and output it on failure. In addition to allowing you to repeat the test by hand, this allows you to trivially create new test which test the specific issue by hardcoding the seed for that test. Of course, it's probably nicer to hand-code an explicit test covering that case, but laziness has its virtues, and this even allows you to essentially auto-generate new test cases from a failing seed.
The one point you make that I can't debate, however, is that it breaks the build systems. Most build and continuous integration tests expect the tests to do the same thing, every time. So a test that randomly fails will create chaos, randomly failing and pointing the fingers at changes that were harmless.
A solution then, is to still run your randomized tests as part of the build and CI tests, but run it with a fixed seed, for a fixed number of iterations. Hence the test always does the same thing, but still explores a bunch of the input space (if you run it for multiple iterations).
Locally, e.g., when changing the concerned class, you are free to run it for more iterations or with other seeds. If randomized testing ever becomes more popular, you could even imagine a specific suite of tests which are known to be random, which could be run with different seeds (hence with increasing coverage over time), and where failures wouldn't mean the same thing as deterministic CI systems (i.e., runs aren't associated 1:1 with code changes and so you don't point a finger at a particular change when things fail).
There is a lot to be said for randomized tests, especially well written ones, so don't be too quick to dismiss them!
If you are doing TDD then I would argue that random data is an excellent approach. If your test is written with constants, then you can only guarantee your code works for the specific value. If your test is randomly failing the build server there is likely a problem with how the test was written.
Random data will help ensure any future refactoring will not rely on a magic constant. After all, if your tests are your documentation, then doesn't the presence of constants imply it only needs to work for those constants?
I am exaggerating however I prefer to inject random data into my test as a sign that "the value of this variable should not affect the outcome of this test".
I will say though that if you use a random variable then fork your test based on that variable, then that is a smell.
In the book Beautiful Code, there is a chapter called "Beautiful Tests", where he goes through a testing strategy for the Binary Search algorithm. One paragraph is called "Random Acts of Testing", in which he creates random arrays to thoroughly test the algorithm. You can read some of this online at Google Books, page 95, but it's a great book worth having.
So basically this just shows that generating random data for testing is a viable option.
Your co-worker is doing fuzz-testing, although he doesn't know about it. They are especially valuable in server systems.
One advantage for someone looking at the tests is that arbitrary data is clearly not important. I've seen too many tests that involved dozens of pieces of data and it can be difficult to tell what needs to be that way and what just happens to be that way. E.g. If an address validation method is tested with a specific zip code and all other data is random then you can be pretty sure the zip code is the only important part.
if it's a random value and the test fails, we need to a) fix the object and b) force ourselves to test for that value every time, so we know it works, but since it's random we don't know what the value was
If your test case does not accurately record what it is testing, perhaps you need to recode the test case. I always want to have logs that I can refer back to for test cases so that I know exactly what caused it to fail whether using static or random data.
You should ask yourselves what is the goal of your test.
Unit tests are about verifying logic, code flow and object interactions. Using random values tries to achieve a different goal, thus reduces test focus and simplicity. It is acceptable for readability reasons (generating UUID, ids, keys,etc.).
Specifically for unit tests, I cannot recall even once this method was successful finding problems. But I have seen many determinism problems (in the tests) trying to be clever with random values and mainly with random dates.
Fuzz testing is a valid approach for integration tests and end-to-end tests.
Can you generate some random data once (I mean exactly once, not once per test run), then use it in all tests thereafter?
I can definitely see the value in creating random data to test those cases that you haven't thought of, but you're right, having unit tests that can randomly pass or fail is a bad thing.
If you're using random input for your tests you need to log the inputs so you can see what the values are. This way if there is some edge case you come across, you can write the test to reproduce it. I've heard the same reasons from people for not using random input, but once you have insight into the actual values used for a particular test run then it isn't as much of an issue.
The notion of "arbitrary" data is also very useful as a way of signifying something that is not important. We have some acceptance tests that come to mind where there is a lot of noise data that is no relevance to the test at hand.
I think the problem here is that the purpose of unit tests is not catching bugs. The purpose is being able to change the code without breaking it, so how are you going to know that you break
your code when your random unit tests are green in your pipeline, just because they doesn't touch the right path?
Doing this is insane for me. A different situation could be running them as integration tests or e2e not as a part of the build, and just for some specific things because in some situations you will need a mirror of your code in your asserts to test that way.
And having a test suite as complex as your real code is like not having tests at all because who
is going to test your suite then? :p
A unit test is there to ensure the correct behaviour in response to particular inputs, in particular all code paths/logic should be covered. There is no need to use random data to achieve this. If you don't have 100% code coverage with your unit tests, then fuzz testing by the back door is not going to achieve this, and it may even mean you occasionally don't achieve your desired code coverage. It may (pardon the pun) give you a 'fuzzy' feeling that you're getting to more code paths, but there may not be much science behind this. People often check code coverage when they run their unit tests for the first time and then forget about it (unless enforced by CI), so do you really want to be checking coverage against every run as a result of using random input data? It's just another thing to potentially neglect.
Also, programmers tend to take the easy path, and they make mistakes. They make just as many mistakes in unit tests as they do in the code under test. It's way too easy for someone to introduce random data, and then tailor the asserts to the output order in a single run. Admit it, we've all done this. When the data changes the order can change and the asserts fail, so a portion of the executions fail. This portion needn't be 1/2 I've seen exactly this result in failures 10% of the time. It takes a long time to track down problems like this, and if your CI doesn't record enough data about enough of the runs, then it can be even worse.
Whilst there's an argument for saying "just don't make these mistakes", in a typical commercial programming setup there'll be a mix of abilities, sometimes relatively junior people reviewing code for other junior people. You can write literally dozens more tests in the time it takes to debug one non-deterministic test and fix it, so make sure you don't have any. Don't use random data.
In my experience unit tests and randomized tests should be separated. Unit tests serve to give a certainty of the correctness of some cases, not only to catch obscure bugs.
All that said, randomized testing is useful and should be done, separately from unit tests, but it should test a series of randomized values.
I can't help to think that testing 1 random value with every run is just not enough, neither to be a sufficient randomized test, neither to be a truly useful unit test.
Another aspect is validating the test results. If you have random inputs, you have to calculate the expected output for it inside the test. This will at some level duplicate the tested logic, making the test only a mirror of the tested code itself. This will not sufficiently test the code, since the test might contain the same errors the original code does.
This is an old question, but I wanted to mention a library I created that generates objects filled with random data. It supports reproducing the same data if a test fails by supplying a seed. It also supports JUnit 5 via an extension.
Example usage:
Person person = Instancio.create(Person.class);
Or a builder API for customising generation parameters:
Person person = Instancio.of(Person.class)
.generate(field("age"), gen -> gen.ints.min(18).max(65))
.create();
Github link has more examples: https://github.com/instancio/instancio
You can find the library on maven central:
<dependency>
<groupId>org.instancio</groupId>
<artifactId>instancio-junit</artifactId>
<version>LATEST</version>
</dependency>
Depending on your object/app, random data would have a place in load testing. I think more important would be to use data that explicitly tests the boundary conditions of the data.
We just ran into this today. I wanted pseudo-random (so it would look like compressed audio data in terms of size). I TODO'd that I also wanted deterministic. rand() was different on OSX than on Linux. And unless I re-seeded, it could change at any time. So we changed it to be deterministic but still psuedo-random: the test is repeatable, as much as using canned data (but more conveniently written).
This was NOT testing by some random brute force through code paths. That's the difference: still deterministic, still repeatable, still using data that looks like real input to run a set of interesting checks on edge cases in complex logic. Still unit tests.
Does that still qualify is random? Let's talk over beer. :-)
I can envisage three solutions to the test data problem:
Test with fixed data
Test with random data
Generate random data once, then use it as your fixed data
I would recommend doing all of the above. That is, write repeatable unit tests with both some edge cases worked out using your brain, and some randomised data which you generate only once. Then write a set of randomised tests that you run as well.
The randomised tests should never be expected to catch something your repeatable tests miss. You should aim to cover everything with repeatable tests, and consider the randomised tests a bonus. If they find something, it should be something that you couldn't have reasonably predicted; a real oddball.
How can your guy run the test again when it has failed to see if he has fixed it? I.e. he loses repeatability of tests.
While I think there is probably some value in flinging a load of random data at tests, as mentioned in other replies it falls more under the heading of load testing than anything else. It is pretty much a "testing-by-hope" practice. I think that, in reality, your guy is simply not thinkng about what he is trying to test, and making up for that lack of thought by hoping randomness will eventually trap some mysterious error.
So the argument I would use with him is that he is being lazy. Or, to put it another way, if he doesn't take the time to understand what he is trying to test it probably shows he doesn't really understand the code he is writing.