Random data in Unit Tests?

Random data in Unit Tests? - unit-testing

I have a coworker who writes unit tests for objects which fill their fields with random data. His reason is that it gives a wider range of testing, since it will test a lot of different values, whereas a normal test only uses a single static value.
I've given him a number of different reasons against this, the main ones being:
random values means the test isn't truly repeatable (which also means that if the test can randomly fail, it can do so on the build server and break the build)
if it's a random value and the test fails, we need to a) fix the object and b) force ourselves to test for that value every time, so we know it works, but since it's random we don't know what the value was
Another coworker added:
If I am testing an exception, random values will not ensure that the test ends up in the expected state
random data is used for flushing out a system and load testing, not for unit tests
Can anyone else add additional reasons I can give him to get him to stop doing this?
(Or alternately, is this an acceptable method of writing unit tests, and I and my other coworker are wrong?)

There's a compromise. Your coworker is actually onto something, but I think he's doing it wrong. I'm not sure that totally random testing is very useful, but it's certainly not invalid.
A program (or unit) specification is a hypothesis that there exists some program that meets it. The program itself is then evidence of that hypothesis. What unit testing ought to be is an attempt to provide counter-evidence to refute that the program works according to the spec.
Now, you can write the unit tests by hand, but it really is a mechanical task. It can be automated. All you have to do is write the spec, and a machine can generate lots and lots of unit tests that try to break your code.
I don't know what language you're using, but see here:
Java
http://functionaljava.org/
Scala (or Java)
http://github.com/rickynils/scalacheck
Haskell
http://www.cs.chalmers.se/~rjmh/QuickCheck/
.NET:
http://blogs.msdn.com/dsyme/archive/2008/08/09/fscheck-0-2.aspx
These tools will take your well-formed spec as input and automatically generate as many unit tests as you want, with automatically generated data. They use "shrinking" strategies (which you can tweak) to find the simplest possible test case to break your code and to make sure it covers the edge cases well.
Happy testing!

This kind of testing is called a Monkey test. When done right, it can smoke out bugs from the really dark corners.
To address your concerns about reproducibility: the right way to approach this, is to record the failed test entries, generate a unit test, which probes for the entire family of the specific bug; and include in the unit test the one specific input (from the random data) which caused the initial failure.

There is a half-way house here which has some use, which is to seed your PRNG with a constant. That allows you to generate 'random' data which is repeatable.
Personally I do think there are places where (constant) random data is useful in testing - after you think you've done all your carefully-thought-out corners, using stimuli from a PRNG can sometimes find other things.

I am in favor of random tests, and I write them. However, whether they are appropriate in a particular build environment and which test suites they should be included in is a more nuanced question.
Run locally (e.g., overnight on your dev box) randomized tests have found bugs both obvious and obscure. The obscure ones are arcane enough that I think random testing was really the only realistic one to flush them out. As a test, I took one tough-to-find bug discovered via randomized testing and had a half dozen crack developers review the function (about a dozen lines of code) where it occurred. None were able to detect it.
Many of your arguments against randomized data are flavors of "the test isn't reproducible". However, a well written randomized test will capture the seed used to start the randomized seed and output it on failure. In addition to allowing you to repeat the test by hand, this allows you to trivially create new test which test the specific issue by hardcoding the seed for that test. Of course, it's probably nicer to hand-code an explicit test covering that case, but laziness has its virtues, and this even allows you to essentially auto-generate new test cases from a failing seed.
The one point you make that I can't debate, however, is that it breaks the build systems. Most build and continuous integration tests expect the tests to do the same thing, every time. So a test that randomly fails will create chaos, randomly failing and pointing the fingers at changes that were harmless.
A solution then, is to still run your randomized tests as part of the build and CI tests, but run it with a fixed seed, for a fixed number of iterations. Hence the test always does the same thing, but still explores a bunch of the input space (if you run it for multiple iterations).
Locally, e.g., when changing the concerned class, you are free to run it for more iterations or with other seeds. If randomized testing ever becomes more popular, you could even imagine a specific suite of tests which are known to be random, which could be run with different seeds (hence with increasing coverage over time), and where failures wouldn't mean the same thing as deterministic CI systems (i.e., runs aren't associated 1:1 with code changes and so you don't point a finger at a particular change when things fail).
There is a lot to be said for randomized tests, especially well written ones, so don't be too quick to dismiss them!

If you are doing TDD then I would argue that random data is an excellent approach. If your test is written with constants, then you can only guarantee your code works for the specific value. If your test is randomly failing the build server there is likely a problem with how the test was written.
Random data will help ensure any future refactoring will not rely on a magic constant. After all, if your tests are your documentation, then doesn't the presence of constants imply it only needs to work for those constants?
I am exaggerating however I prefer to inject random data into my test as a sign that "the value of this variable should not affect the outcome of this test".
I will say though that if you use a random variable then fork your test based on that variable, then that is a smell.

In the book Beautiful Code, there is a chapter called "Beautiful Tests", where he goes through a testing strategy for the Binary Search algorithm. One paragraph is called "Random Acts of Testing", in which he creates random arrays to thoroughly test the algorithm. You can read some of this online at Google Books, page 95, but it's a great book worth having.
So basically this just shows that generating random data for testing is a viable option.

Your co-worker is doing fuzz-testing, although he doesn't know about it. They are especially valuable in server systems.

One advantage for someone looking at the tests is that arbitrary data is clearly not important. I've seen too many tests that involved dozens of pieces of data and it can be difficult to tell what needs to be that way and what just happens to be that way. E.g. If an address validation method is tested with a specific zip code and all other data is random then you can be pretty sure the zip code is the only important part.

if it's a random value and the test fails, we need to a) fix the object and b) force ourselves to test for that value every time, so we know it works, but since it's random we don't know what the value was
If your test case does not accurately record what it is testing, perhaps you need to recode the test case. I always want to have logs that I can refer back to for test cases so that I know exactly what caused it to fail whether using static or random data.

You should ask yourselves what is the goal of your test.
Unit tests are about verifying logic, code flow and object interactions. Using random values tries to achieve a different goal, thus reduces test focus and simplicity. It is acceptable for readability reasons (generating UUID, ids, keys,etc.).
Specifically for unit tests, I cannot recall even once this method was successful finding problems. But I have seen many determinism problems (in the tests) trying to be clever with random values and mainly with random dates.
Fuzz testing is a valid approach for integration tests and end-to-end tests.

Can you generate some random data once (I mean exactly once, not once per test run), then use it in all tests thereafter?
I can definitely see the value in creating random data to test those cases that you haven't thought of, but you're right, having unit tests that can randomly pass or fail is a bad thing.

If you're using random input for your tests you need to log the inputs so you can see what the values are. This way if there is some edge case you come across, you can write the test to reproduce it. I've heard the same reasons from people for not using random input, but once you have insight into the actual values used for a particular test run then it isn't as much of an issue.
The notion of "arbitrary" data is also very useful as a way of signifying something that is not important. We have some acceptance tests that come to mind where there is a lot of noise data that is no relevance to the test at hand.

I think the problem here is that the purpose of unit tests is not catching bugs. The purpose is being able to change the code without breaking it, so how are you going to know that you break
your code when your random unit tests are green in your pipeline, just because they doesn't touch the right path?
Doing this is insane for me. A different situation could be running them as integration tests or e2e not as a part of the build, and just for some specific things because in some situations you will need a mirror of your code in your asserts to test that way.
And having a test suite as complex as your real code is like not having tests at all because who
is going to test your suite then? :p

A unit test is there to ensure the correct behaviour in response to particular inputs, in particular all code paths/logic should be covered. There is no need to use random data to achieve this. If you don't have 100% code coverage with your unit tests, then fuzz testing by the back door is not going to achieve this, and it may even mean you occasionally don't achieve your desired code coverage. It may (pardon the pun) give you a 'fuzzy' feeling that you're getting to more code paths, but there may not be much science behind this. People often check code coverage when they run their unit tests for the first time and then forget about it (unless enforced by CI), so do you really want to be checking coverage against every run as a result of using random input data? It's just another thing to potentially neglect.
Also, programmers tend to take the easy path, and they make mistakes. They make just as many mistakes in unit tests as they do in the code under test. It's way too easy for someone to introduce random data, and then tailor the asserts to the output order in a single run. Admit it, we've all done this. When the data changes the order can change and the asserts fail, so a portion of the executions fail. This portion needn't be 1/2 I've seen exactly this result in failures 10% of the time. It takes a long time to track down problems like this, and if your CI doesn't record enough data about enough of the runs, then it can be even worse.
Whilst there's an argument for saying "just don't make these mistakes", in a typical commercial programming setup there'll be a mix of abilities, sometimes relatively junior people reviewing code for other junior people. You can write literally dozens more tests in the time it takes to debug one non-deterministic test and fix it, so make sure you don't have any. Don't use random data.

In my experience unit tests and randomized tests should be separated. Unit tests serve to give a certainty of the correctness of some cases, not only to catch obscure bugs.
All that said, randomized testing is useful and should be done, separately from unit tests, but it should test a series of randomized values.
I can't help to think that testing 1 random value with every run is just not enough, neither to be a sufficient randomized test, neither to be a truly useful unit test.
Another aspect is validating the test results. If you have random inputs, you have to calculate the expected output for it inside the test. This will at some level duplicate the tested logic, making the test only a mirror of the tested code itself. This will not sufficiently test the code, since the test might contain the same errors the original code does.

This is an old question, but I wanted to mention a library I created that generates objects filled with random data. It supports reproducing the same data if a test fails by supplying a seed. It also supports JUnit 5 via an extension.
Example usage:
Person person = Instancio.create(Person.class);
Or a builder API for customising generation parameters:
Person person = Instancio.of(Person.class)
.generate(field("age"), gen -> gen.ints.min(18).max(65))
.create();
Github link has more examples: https://github.com/instancio/instancio
You can find the library on maven central:
<dependency>
<groupId>org.instancio</groupId>
<artifactId>instancio-junit</artifactId>
<version>LATEST</version>
</dependency>

Depending on your object/app, random data would have a place in load testing. I think more important would be to use data that explicitly tests the boundary conditions of the data.

We just ran into this today. I wanted pseudo-random (so it would look like compressed audio data in terms of size). I TODO'd that I also wanted deterministic. rand() was different on OSX than on Linux. And unless I re-seeded, it could change at any time. So we changed it to be deterministic but still psuedo-random: the test is repeatable, as much as using canned data (but more conveniently written).
This was NOT testing by some random brute force through code paths. That's the difference: still deterministic, still repeatable, still using data that looks like real input to run a set of interesting checks on edge cases in complex logic. Still unit tests.
Does that still qualify is random? Let's talk over beer. :-)

I can envisage three solutions to the test data problem:
Test with fixed data
Test with random data
Generate random data once, then use it as your fixed data
I would recommend doing all of the above. That is, write repeatable unit tests with both some edge cases worked out using your brain, and some randomised data which you generate only once. Then write a set of randomised tests that you run as well.
The randomised tests should never be expected to catch something your repeatable tests miss. You should aim to cover everything with repeatable tests, and consider the randomised tests a bonus. If they find something, it should be something that you couldn't have reasonably predicted; a real oddball.

How can your guy run the test again when it has failed to see if he has fixed it? I.e. he loses repeatability of tests.
While I think there is probably some value in flinging a load of random data at tests, as mentioned in other replies it falls more under the heading of load testing than anything else. It is pretty much a "testing-by-hope" practice. I think that, in reality, your guy is simply not thinkng about what he is trying to test, and making up for that lack of thought by hoping randomness will eventually trap some mysterious error.
So the argument I would use with him is that he is being lazy. Or, to put it another way, if he doesn't take the time to understand what he is trying to test it probably shows he doesn't really understand the code he is writing.

Related

Using randomness and/or iterations in unit tests?

In unit tests, I have become used to test methods applying some regular values, some values offending the method contract, and all border-cases I can come up with.
But is it very bad-practice to
test on random values, this is a value within a range you think should never give any trouble, so that each time the test runs, another value is passed in? As a kind of extensive testing of regular values?
test on whole ranges, using iteration ?
I have a feeling both of this approaches aren't any good. With range-testing I can imagine that it's just not practical to do that, since the time it is taking, but with randomness?
UPDATE :
I'm not using this technique myself, was just wondering about it. Randomness can be a good tool, I know now, if you can make it reproduceable when you need to.
The most interesting reply was the 'fuzzing' tip from Lieven :
http://en.wikipedia.org/wiki/Fuzz_testing
tx

Unit tests need to be fast. if they aren't people won't run them regularly. At times I did code for checking the whole range but #Ignore'd commented it out in the end because it made the tests too slow. If I were to use random values, I would go for a PRNG with fixed seeds so that every run actually checks the same numbers.

Random Input - The tests would not be repeatable (produce consistent results every time they are run and hence are not considered good unit tests. Tests should not change their mind.
Range tests / RowTests - are good as long as they dont slow down the test suite run.. each test should run as fast as possible. (A done-in-30sec test suite gets run more often than a 10 min one) - preferably 100ms or less. That said Each input (test data) should be 'representative' input. If all input values are the same, testing each one isn't adding any value and is just routine number crunching. You just need one representative from that set of values. You also need representatives for boundary conditions and 'special' values.
For more on guidelines or thumbrules - see 'What makes a Good Unit Test?'
That said... the techniques you mentioned could be great to find representative inputs.. So use them to find scenarioX where code fails or succeeds incorrectly - then write up a repeatable,quick,tests-one-thing-only unit test for that scenarioX and add it to your test suite. If you find that these tools continue to help you find more good test-cases.. persist with them.
Response to OP's clarification:
If you use the same seed value (test input) for your random no generator on each test run, your test is not random - values can be predetermined. However a unit test ideally shouldn't need any input/output - that is why xUnit test cases have the void TC() signature.
If you use different seed values on each run, now your tests are random and not repeatable. Of course you can hunt down the special seed value in your log files to know what failed (and reproduce the error) but I like my tests to instantly let me know what failed - e.g. a Red TestConversionForEnums() lets me know that the Enum Conversion code is broken without any inspection.
Repeatable - implies that each time the test is run on the SUT, it produces the same result (pass/fail).. not 'Can I reproduce test failure again?' (Repeatable != Reproducible). To reiterate.. this kind of exploratory testing may be good to identify more test cases but I wouldn't add this to my test suite that I run each time I make a code change during the day. I'd recommend doing exploratory testing manually, find some good (some may use sadistic) Testers that'll go hammer and tongs at your code.. will find you more test cases than a random input generator.

I have been using randomness in my testcases. It found me some errors in the SUT and it gave me some errors in my testcase.
Note that the testcase get more complex by using randomnes.
You'll need a method to run your testcase with the random value(s) it failed on
You'll need to log the random values used for every test.
...
All in all, I'm throthling back on using randomness but not dismissing it enterly. As with every technique, it has its values.
For a better explanation of what you are after, look up the term fuzzing

What you describe is usually called specification-based testing and has been implemented by frameworks such as QuickCheck (Haskell), scalacheck (Scala) and Quviq QuickCheck (Erlang).
Data-based testing tools (such as DataProvider in TestNG) can achieve similar results.
The underlying principle is to generate input data for the subject under test based upon some sort of specification and is far from "bad practice".

What are you testing? The random number generator? Or your code?
If your code, what if there is a bug in the code that produces random numbers?
What if you need to reproduce a problem, do you keep restarting the test hoping that it will eventually use the same sequence as you had when you discovered the problem?
If you decide to use a random number generator to produce data, at least seed it with a known constant value, so it's easy to reproduce.
In other words, your "random numbers" are just a "sequence of numbers I really don't care all that much about".

So long as it will tell you in some way what random value it failed on I don't suppose it's that bad. However, you're almost relying on luck to find a problem in your application then.
Testing the whole range will ensure you have every avenue covered but it seems like overkill when you have the edges covered and, I assume, a few middle-ground accepted values.

The goal of unit-testing is to get confidence in your code. Therefore, if you feel that using random values could help you find some more bugs, you obviously need more tests to increase your confidence level.
In that situation, you could rely on iteration-based testing to identify those problems.
I'd recommend creating new specific tests for the cases discovered with the loop testing, and removing the iteration-based tests then; so that they don't slow down your tests.

I have used randomness for debugging a field problem with a state machine leaking a resource. We code inspected, ran the unit-tests and couldn't reproduce the leak.
We fed random events from the entire possible event space into the state machine unit test environment. We looked at the invariants after each event and stopped when they were violated.
The random events eventually exposed a sequence of events that produced a leak.
The state machine leaked a resource when a 2nd error occurred while recovering from a first error.
We were then able to reproduce the leak in the field.
So randomness found a problem that was difficult to find otherwise. A little brute force but the computer didn't mind working the weekend.

I wouldn't advocate completely random values as it will give you a false sense of security. If you can't go through the entire range (which is often the case) it's far more efficient to select a subset by hand. This way you will also have to think of possible "odd" values, values that causes the code to run differently (and are not near edges).
You could use a random generator to generate the test values, check that they represent a good sample and then use them. This is a good idea especially if choosing by hand would be too time-consuming.
I did use random test values when I wrote a semaphore driver to use for a hw block from two different chips. In this case I couldn't figure out how to choose meaningful values for the timings so I randomized how often the chips would (independently) try to access the block. In retrospect it would still have been better to choose them by hand, because getting the test environment to work in such a way that the two chips didn't align themselves was not as simple as I thought. This was actually a very good example of when random values do not create a random sample.
The problem was caused by the fact that whenever the other chip had reserved the block the other waited and true to a semaphore got access right after the other released it. When I plotted how long the chips had to wait for access the values were in fact far from random. Worst was when I had the same value range for both random values, it got slightly better after I changed them to have different ranges, but it still wasn't very random. I started getting something of a random test only after I randomized both the waiting times between accesses and how long the block was reserved and chose the four sets carefully.
In the end I probably ended up using more time writing the code to use "random" values than I would have used to pick meaningful values by hand in the first place.

See David Saff's work on Theory-Based Testing.
Generally I'd avoid randomness in unit tests, but the theory stuff is intriguing.

The 'key' point here is unit test. A slew of random values in the expected ranged as well as edges for the good case and ouside range/boundary for bad case is valuable in a regression test, provided the seed is constant.
a unit test may use random values in the expected range, if it is possible to always save the inputs/outputs (if any) before and after.

How does unit testing work when the program doesn't lend itself to a functional style?

I'm thinking of the case where the program doesn't really compute anything, it just DOES a lot. Unit testing makes sense to me when you're writing functions which calculate something and you need to check the result, but what if you aren't calculating anything? For example, a program I maintain at work relies on having the user fill out a form, then opening an external program, and automating the external program to do something based on the user input. The process is fairly involved. There's like 3000 lines of code (spread out across multiple functions*), but I can't think of a single thing which it makes sense to unit test.
That's just an example though. Should you even try to unit test "procedural" programs?
*EDIT

Based on your description these are the places I would look to unit test:
Does the form validation work of user input work correctly
Given valid input from the form is the external program called correctly
Feed in user input to the external program and see if you get the right output
From the sounds of your description the real problem is that the code you're working with is not modular. One of the benefits I find with unit testing is that it code that is difficult to test is either not modular enough or has an awkward interface. Try to break the code down into smaller pieces and you'll find places where it makes sense to write unit tests.

I'm not an expert on this but have been confused for a while for the same reason. Somehow the applications I'm doing just don't fit to the examples given for UNIT testing (very asynchronous and random depending on heavy user interaction)
I realized recently (and please let me know if I'm wrong) that it doesn't make sense to make a sort of global test but rather a myriad of small tests for each component. The easiest is to build the test in the same time or even before creating the actual procedures.

Do you have 3000 lines of code in a single procedure/method? If so, then you probably need to refactor your code into smaller, more understandable pieces to make it maintainable. When you do this, you'll have those parts that you can and should unit test. If not, then you already have those pieces -- the individual procedures/methods that are called by your main program.
Even without unit tests, though, you should still write tests for the code to make sure that you are providing the correct inputs to the external program and testing that you handle the outputs from the program correctly under both normal and exceptional conditions. Techniques used in unit testing -- like mocking -- can be used in these integration tests to ensure that your program is operating correctly without involving the external resource.

An interesting "cut point" for your application is you say "the user fills out a form." If you want to test, you should refactor your code to construct an explicit representation of that form as a data structure. Then you can start collecting forms and testing that the system responds appropriately to each form.
It may be that the actions taken by your system are not observable until something hits the file system. Here are a couple of ideas:
Set up something like a git repository for the initial state of the file system, run a form, and look at the output of git diff. It's likely this is going to feel more like regression testing than unit testing.
Create a new module whose only purpose is to make your program's actions observable. This can be as simple as writing relevant text to a log file or as complex as you like. If necessary, you can use conditional compilation or linking to ensure this module does something only when the system is under test. This is closer to traditional unit testing as you can now write tests that say upon receiving form A, the system should take sequence of actions B. Obviously you have to decide what actions should be observed to form a reasonable test.
I suspect you'll find yourself migrating toward something that looks more like regression testing than unit testing per se. That's not necessarily bad. Don't overlook code coverage!
(A final parenthetical remark: in the bad old days of interactive console applications, Don Libes created a tool called Expect, which was enormously helpful in allowing you to script a program that interacted like a user. In my opinion we desperately need something similar for interacting with web pages. I think I'll post a question about this :-)

You don't necessarily have to implement automated tests that test individual methods or components. You could implement an automated unit test that simulates a user interacting with your application, and test that your application responds in the correct way.
I assume you are manually testing your application currently, if so then think about how you could automate that and work from there. Over time you should be able to break your tests into progressively smaller chunks that test smaller sections of code. Any sort of automated testing is usually a lot better than nothing.

Most programs (regardless of the language paradigm) can be broken into atomic units which take input and provide output. As the other responders have mentioned, look into refactoring the program and breaking it down into smaller pieces. When testing, focus less on the end-to-end functionality and more on the individual steps in which data is processed.
Also, a unit doesn't necessarily need to be an individual function (though this is often the case). A unit is a segment of functionality which can be tested using inputs and measuring outputs. I've seen this when using JUnit to test Java APIs. Individual methods might not necessarily provide the granularity I need for testing, though a series of method calls will. Therefore, the functionality I regard as a "unit" is a little greater than a single method.

You should at least refactor out the stuff that looks like it might be a problem and unit test that. But as a rule, a function shouldn't be that long. You might find something that is unit test worthy once you start refactoring
Good object mentor article on TDD

As a few have answered before, there are a few ways you can test what you have outlined.
First the form input, can be tested in a few ways.
What happens if invalid data is inputted, valid data, etc.
Then each of the function can be tested to see if the functions when supplied with various forms of correct and incorrect data react in the proper manner.
Next you can mock the application that are being called so that you can make sure that your application send and process data to the external programs correctly. Don't for get to make sure your program deals with unexpected data from the external program as well.
Usually, the way I figure out how I want to write tests for a program I have been assigned to maintain, is to see what I am do manually to test the program. Then try and figure how to automate as much of it as possible. Also, don't restrict your testing tools just to the programming language you are writing the code in.

I think a wave of testing paranoia is spreading :) Its good to examine things to see if tests would make sense, sometimes the answer is going to be no.
The only thing that I would test is making sure that bogus form input is handled correctly.. I really don't see where else an automated test would help. I think you'd want the test to be non invasive (i.e. no record is actually saved during testing), so that might rule out the other few possibilities.

If you can't test something how do you know that it works? A key to software design is that the code should be testable. That may make the actual writing of the software more difficult, but it pays off in easier maintenance later.

Unit Test Connundrum

I'm looking to unit testing as a means of regression testing on a project.
However, my issue is that the project is basically a glorified DIR command -- it performs regular expression tests and MD5 filters on the results, and allows many criteria to be specified, but the entire thing is designed to process input from the system on which it runs.
I'm also a one-man development team, and I question the value of a test for code written by me which is written by me.
Is unit testing worthwhile in this situation? If so, how might such tests be accomplished?
EDIT: MD5 and Regex functions aren't provided by me -- they are provided by the Crypto++ library and Boost, respectively. Therefore I don't gain much by testing them. Most of the code I have simply feeds data into the libraries, and the prints out the results.

The value of test-after, the way you are asking, can indeed be limited in certain circumstances, but the way to unit test, from the description would be to isolate the regular expression tests and MD5 filters into one section of code, and abstract the feeding of the input so that in production it feeds from the system, and during the unit test, your test class passes in that input.
You then collect a sampling of the different scenarios you intend to support, and feed those in via different unit tests that exercise each scenario.
I think the value of the unit test will come through if you have to change the code to handle new scenarios. You will be confident that the old scenarios don't break as you make changes.

Is unit testing worthwhile in this situation?
Not necessarily: especially for a one-man team I think it may be sufficient to have automated testing of something larger than a "unit" ... further details at "Should one test internal implementation, or only test public behaviour?"

Unit testing can still provide value in a one-man show. It gives you confidence in the functionality and correctness (at some level) of the module. But some design considerations may be needed to help make testing more applicable to your code. Modularization makes a big difference, especially if combined with some kind of dependency injection, instead of tight coupling. This allows test versions of collaborators to be used for testing a module in isolation. In your case, a mock file system object could return a predictable set of data, so your filtering and criteria code can be evaluated.

The value of regression testing is often not realized until it's automated. Once that's done, things become a lot easier.
That means you have to be able to start from a known position (if you're generating MD5s on files, you have to start with the same files each time). Then get one successful run where you can save the output - that's the baseline.
From that point on, regression testing is simply a push-button job. Start your test, collect the output and compare it to your known baseline (of course, if the output ever changes, you'll need to check it manually, or with another independent script before saving it as the new baseline).
Keep in mind the idea of regression testing is to catch any bugs introduced by new code (i.e., regressing the software). It's not to test the functionality of that new code.
The more you can automate this, the better, even as a one-man development team.

When you were writing the code, did you test it as you went? Those tests could be written into an automated script, so that when you have to make a change in 2 months time, you can re-run all those tests to ensure you haven't regressed something.
In my experience the chance of regression increases sharply depending on how much time goes by after the time you finish version 1 and start coding version 2, because you'll typically forget the subtle nuances of how it works under different conditions - your unit tests are a way of encoding those nuances.

An integration test against the filesystem would be worthwhile. Just make sure it does what it needs to do.

Is unit testing valuable in a one-man shop scenario? Absolutely! There's nothing better than being able to refactor your code with absolute confidence you haven't broken anything.
How do I unit test this? Mock the system calls, and then test the rest of your logic.

I question the value of a test for code written by me which is written by me
Well, that's true now but in a year it will be you, the one-year-more-experienced developer developing against software written by you-now, the less experienced and knowledgeable developer (by comparison). Won't you want the code written by that less experienced guy (you a year ago) to be properly tested so you can make changes with confidence that nothing has broken?

How is unit testing better than just testing the entire output of your application as a whole?

I don't understand how an unit test could possibly benefit.
Isn't it sufficient for a tester to test the entire output as a whole rather than doing unit tests?
Thanks.

What you are describing is integration testing. What integration testing will not tell you is which piece of your massive application is not working correctly when your output is no longer correct.
The advantage to unit testing is that you can write a test for each business assumption or algorithm step that you need your program to perform. When someone adds or changes code to your application, you immediately know exactly which step, which piece, and maybe even which line of code is broken when a bug is introduced. The time savings on maintenence for that reason alone makes it worthwhile, but there is an even bigger advantage in that regression bugs cannot be introduced (assuming your tests are running automatically when you build your software). If you fix a bug, and then write a test specifically to catch that bug in the future, there is no way someone could accidentally introduce it again.
The combination of integration testing and unit testing can let you sleep much easier at night, especially when you've checked in a big piece of code that day.

The earlier you catch bugs, the cheaper they are to fix. A bug found during unit testing by the coder is pretty cheap (just fix the darn thing).
A bug found during system or integration testing costs more, since you have to fix it and restart the test cycle.
A bug found by your customer will cost a lot: recoding, retesting, repackaging and so forth. It may also leave a painful boot print on your derriere when you inform management that you didn't catch it during unit testing because you didn't do any, thinking that the system testers would find all the problems :-)
How much money would it cost GM to recall 10,000 cars because the catalytic converter didn't work properly?
Now think of how much it would cost them if they discovered that immediately after those converters were delivered to them, but before they were put into those 10,000 cars.
I think you'll find the latter option to be quite a bit cheaper.
That's one reason why test driven development and continuous integration are (sometimes) a good thing - testing is done all the time.
In addition, unit tests don't check that the program works as a whole, just that each little bit performs as expected. That's often quite a lot more than higher level tests would check.

From my experience:
Integration and functional testing tend to be more indicative of the overall quality of the system, than unit test suit is.
High level testing (functional, acceptance) is a QA tool.
Unit testing is a development tool. Especially in a TDD context, where unit test becomes more of a design implement, rather than that of a quality assurance.
As a result of better design, quality of the entire system improves (indirectly).
Passing unit test suite is meant to ensure that a single component conforms to the developer's intentions (correctness). Acceptance test is the level that covers validity of the system (i.e. system does what user want it to do).
Summary:
Unit test is meant as a development tool first, QA tool second.
Acceptance test is meant as a QA tool.

There is still a need for a certain level of manual testing to be performed but unit testing is used to decrease the number of defects that make it to that stage. Unit testing tests the smallest parts of the system and if they all work the chances of the application as a whole working correctly are increased significantly.
It also assists when adding new features since regression testing can be performed quickly and automatically.

For a complex enough application, testing the entire output as a whole may not cover enough different possibilities. For example, any given application has a huge number of different code paths that can be followed depending on input. In typical testing, there may be many parts of your code that are simply never encountered, because they are only used in certain circumstances, so you can't be sure that any code that isn't run in your test situation, actually works. Also, errors in one section of code may be masked a majority of the time by something else in another section of code, so you may never discover some errors.
It is better to test each function or class separately. That way, the test is easier to write, because you are only testing a certain small section of the code. It's also easier to cover every possible code path when testing, and if you test each small part separately then you can detect errors even when those errors would often be masked by other parts of your code when run in your application.

Do yourself a favor and try out unit testing first. I was quite the skeptic myself until I realized just how darned helpful/powerful unit-tests can be. If you think about it, they aren't really there to add to your workload. They are there to provide you with peace of mind and allow you to continue extending your application while ensuring that your code is solid. You get immediate feedback as to when you may have broke something and this is something of extraordinary value.
To your question regarding why to test small sections of code consider this: Suppose your giant app uses a cool XOR encryption scheme that you wrote and eventually product management changes the requirements of how you generate these encrypted strings. So you say: "Heck, I wrote the the encryption routine so I'll go ahead and make the change. It'll take me 15 minutes and we'll all go home and have a party." Well, perhaps you introduced a bug during this process. But wait!!! Your handy dandy TestXOREncryption() test method immediately tells you that the expected output did not match the input. Bingo, this is why you broke down your unit tests ahead of time into small "units" to test for because in your big giant application you would not have figured this out nearly as fast.
Also, once you get into the frame of mind of regularly writing unit tests you'll realize that although you pay an upfront cost in the beginning in terms of time, you'll get that back 10 fold later in the development cycle when you can quickly identify areas in your code that have introduced problems.
There is no magic bullet with unit tests because your ability to identify problems is only as good as the tests you write. It boils down to delivering a better product and relieving yourself of stress and headaches. =)

Agree with most of the answers. Let's drill down on the topic of speed. Here are some real numbers:
Unit test results in 1 or 2 minutes from a
fresh compile. As true unit tests
(no interaction with external
systems like dbs) they can cover a
lot of logic really fast.
Automated functional test results in 1 or 2 hours. These run on a simplified platform, but sometimes cover multiple systems and the database - which really kills the speed.
Automated integration test results once a day. These exercise the full meal deal, but are so heavy and slow, we can only execute them once a day and it takes a few hours.
Manual regression results come in after a few weeks. We get stuff over to testers a few times a day, but your change isn't realistically regressed for week or two at best.
I want to find out what I broke in 1 or 2 minutes, not a few weeks, not even a few hours. That's where the 10fold ROI on unit tests that people talk about comes from.

This is a tough question to approach because it questions something of such enormous breadth. Here's my short answer, however:
Test Driven Development (or TDD) seeks to prove that every logical unit of an application (or block of code) functions exactly as it should. By making tests as automated as possible for productivity's sake, how could this really be harmful?
By testing every logical piece of code, you can trust the usage of the code up some hierarchy. Say I build an application that relies on a thread-safe stack implementation. Shouldn't the stack be guaranteed to work up at every stage before I build on it?
The key is that if something in the whole application breaks, meaning just looking at the total output/outcome, how do you know where it came from? Well, debugging, of course! Which puts you back where you started. TDD allows you to -hopefully- bypass this most painful stage in development.

Testers generally test end to end functionality. Obviously this is geared for going at user scenarios and has incredible value.
Unit Tests serve a different functionality. The are the developers way of verifying the components they write work correctly in the absence of other features or in combination with other features. This offers a range of value including
Provides un-ignorable documentation
Ability to isolate bugs to specific components
Verify invariants in the code
Provide quick, immediate feedback to changes in the code base.

One place to start is regression testing. Once you find a bug, write a small test that demonstrates the bug, fix it, then make sure the test now passes. In future you can run that test before each release to ensure that the bug has not been reintroduced.
Why do that at a unit level instead of a whole-program level? Speed. In good code it's much faster to isolate a small unit and write a tiny test than to drive a complex program through to the bug point. Then when testing a unit test will generally run significantly faster than an integration test.

Very simply: Unit tests are easier to write, since you're only testing a single method's functionality. And bugs are easier to fix, since you know exactly what method is broken.
But like the other answerers have pointed out, unit tests aren't the end-all-be-all of testing. They're just the smallest piece of the equation.

Probably the single biggest difficulty with software is the sheer number of interacting things, and the most useful technique is to reduce the number of things that have to be considered.
For example, using higher-level languages rather than lower-level improves productivity, because one line is a separate thing, and being able to write a program in fewer lines reduces the number of things.
Procedural programming came about as an attempt to reduce complexity by making it possible to treat a function as a thing. In order to do that, though, we have to be able to think about what the function does in a coherent manner, and with confidence that we're right. (Object-oriented programming does a similar thing, on a larger scale.)
There are several ways to do this. Design-by-contract is a way of exactly specifying what the function does. Using function parameters rather than global variables to call the function and get results reduces the complexity of the function.
Unit testing is one way to verify that the function does what it is supposed to. It's usually possible to test all the code in a function, and sometimes all the execution paths. It is a way to tell if the function works as it should or not. If the function works, we can think about it as a single thing, rather than as multiple things we have to keep track of.
It serves other purposes. Unit tests are usually quick to run, and so can catch bugs quickly, when they're easy to fix. If developers make sure a function passes the tests before being checked in, then the tests are a form of documenting what the function does that is guaranteed correct. The act of creating the tests forces the test writer to think about what the function should be doing. After that, whoever wanted the change can look at the tests to see if he or she was properly understood.
By way of contrast, larger tests are not exhaustive, and so can easily miss lots of bugs. They're bad at localizing bugs. They are usually performed at fairly long intervals, so they may detect a bug some time after it's made. They define parts of the total user experience, but provide no basis to reason about any part of the system. They should not be neglected, but they are not a substitute for unit tests.

As others have stated, the length of the feedback loop and isolation of the problem to a specific component are key benefits of Unit Tests.
Another way that they are complementary to functional tests is how coverage is tracked in some organizations:
Unit tests on code coverage
Functional tests on requirements coverage
Functional tests might miss features that were implemented but are not in the spec.
Being based on the code, Unit tests might miss that a certain feature wasn't implemented, which is where requirements based coverage analysis of Functional testing comes in.
A final point : there are some things that are easier/faster to test at the unit level, especially around error scenarios.

Unit testing will help you identify the source of your bug more clearly and let you know that you have a problem earlier. Both are good to have, but they are different, and unit testing does have benefits.

The software you test is a system. When you are testing it as a whole you are black box testing since you primarily deal with inputs and outputs. Black box testing is great when you have no means of getting inside of the system.
But since you usually do, you create a lot of unit tests that actually test your system as a white box. You can slice system open in many ways and organize your tests depending on system internal structure. White box testing provides you with many more ways of testing and analyzing systems. It's clearly complimentary to Black box testing and should not be considered as an alternative or competing methodology.

Testing with random inputs best practices

NOTE: I mention the next couple of paragraphs as background. If you just want a TL;DR, feel free to skip down to the numbered questions as they are only indirectly related to this info.
I'm currently writing a python script that does some stuff with POSIX dates (among other things). Unit testing these seems a little bit difficult though, since there's such a wide range of dates and times that can be encountered.
Of course, it's impractical for me to try to test every single date/time combination possible, so I think I'm going to try a unit test that randomizes the inputs and then reports what the inputs were if the test failed. Statisically speaking, I figure that I can achieve a bit more completeness of testing than I could if I tried to think of all potential problem areas (due to missing things) or testing all cases (due to sheer infeasability), assuming that I run it enough times.
So here are a few questions (mainly indirectly related to the above ):
What types of code are good candidates for randomized testing? What types of code aren't?
How do I go about determining the number of times to run the code with randomized inputs? I ask this because I want to have a large enough sample to determine any bugs, but don't want to wait a week to get my results.
Are these kinds of tests well suited for unit tests, or is there another kind of test that it works well with?
Are there any other best practices for doing this kind of thing?
Related topics:
Random data in unit tests?

I agree with Federico - randomised testing is counterproductive. If a test won't reliably pass or fail, it's very hard to fix it and know it's fixed. (This is also a problem when you introduce an unreliable dependency, of course.)
Instead, however, you might like to make sure you've got good data coverage in other ways. For instance:
Make sure you have tests for the start, middle and end of every month of every year between 1900 and 2100 (if those are suitable for your code, of course).
Use a variety of cultures, or "all of them" if that's known.
Try "day 0" and "one day after the end of each month" etc.
In short, still try a lot of values, but do so programmatically and repeatably. You don't need every value you try to be a literal in a test - it's fine to loop round all known values for one axis of your testing, etc.
You'll never get complete coverage, but it will at least be repeatable.
EDIT: I'm sure there are places where random tests are useful, although probably not for unit tests. However, in this case I'd like to suggest something: use one RNG to create a random but known seed, and then seed a new RNG with that value - and log it. That way if something interesting happens you will be able to reproduce it by starting an RNG with the logged seed.

With respect to the 3rd question, in my opinion random tests are not well suited for unit testing. If applied to the same piece of code, a unit test should succeed always, or fail always (i.e., wrong behavior due to bugs should be reproducible). You could however use random techniques to generate a large data set, then use that data set within your unit tests; there's nothing wrong with it.

Wow, great question! Some thoughts:
Random testing is always a good confidence building activity, though as you mentioned, it's best suited to certain types of code.
It's an excellent way to stress-test any code whose performance may be related to the number of times it's been executed, or to the sequence of inputs.
For fairly simple code, or code that expects a limited type of input, I'd prefer systematic test that explicitly cover all of the likely cases, samples of each unlikely or pathological case, and all the boundary conditions.

Q1) I found that distributed systems with lots of concurrency are good candidates for randomized testing. It is hard to create all possible scenarios for such applications, but random testing can expose problems that you never thought about.
Q2) I guess you could try to use statistics to build an confidence interval around having discovered all "bugs". But the practical answer is: run your randomized tests as many times as you can afford.
Q3) I have found that randomized testing is useful but after you have written the normal battery of unit, integration and regression tests. You should integrate your randomized tests as part of the normal test suite, though probably a small run. If nothing else, you avoid bit rot in the tests themselves, and get some modicum coverage as the team runs the tests with different random inputs.
Q4) When writing randomized tests, make sure you save the random seed with the results of the tests. There is nothing more frustrating than finding that your random tests caught a bug, and not being able to run the test again with the same input. Make sure your test can either be executed with the saved seed too.

A few things:
With random testing, you can't really tell how good a piece of code is, but you can tell how bad it is.
Random testing is better suited for things that have random inputs -- a prime example is anything that's exposed to users. So, for example, something that randomly clicks & types all over your app (or OS) is a good test of general robustness.
Similarly, developers count as users. So something that randomly assembles a GUI from your framework is another good candidate.
Again, you're not going to find all the bugs this way -- what you're looking for is "if I do a million whacky things, do ANY of them result in system corruption?" If not, you can feel some level of confidence that your app/OS/SDK/whatever might hold up to a few days' exposure to users.
...But, more importantly, if your random-beater-upper test app can crash your app/OS/SDK in about 5 minutes, that's about how long you'll have until the first fire-drill if you try to ship that sucker.
Also note: REPRODUCIBILITY IS IMPORTANT IN TESTING! Hence, have your test-tool log the random-seed that it used, and have a parameter to start with the same seed. In addition, have it either start from a known "base state" (i.e., reinstall everything from an image on a server & start there) or some recreatable base-state (i.e., reinstall from that image, then alter it according to some random-seed that the test tool takes as a parameter.)
Of course, the developers will appreciate if the tool has nice things like "save state every 20,000 events" and "stop right before event #" and "step forward 1/10/100 events." This will greatly aid them in reproducing the problem, finding and fixing it.
As someone else pointed out, servers are another thing exposed to users. Get yourself a list of 1,000,000 URLs (grep from server logs), then feed them to your random number generator.
And remember: "system went 24 hours of random pounding without errors" does not mean it's ready to ship, it just means it's stable enough to start some serious testing. Before it can do that, QA should feel free to say "look, your POS can't even last 24 hours under life-like random user simulation -- you fix that, I'm going to spend some time writing better tools."
Oh yeah, one last thing: in addition to the "pound it as fast & hard as you can" tests, have the ability to do "exactly what a real user [who was perhaps deranged, or a baby bounding the keyboard/mouse] would do." That is, if you're doing random user-events; do them at the speed that a very-fast typist or very-fast mouse-user could do (with occasional delays, to simulate a SLOW person), in addition to "as fast as my program can spit-out events." These are two **very different* types of tests, and will get very different reactions when bugs are found.

To make tests reproducible, simply use a fixed seed start value. That ensures the same data is used whenever the test runs. Tests will reliably pass or fail.
Good / bad candidates? Randomized tests are good at finding edge cases (exceptions). A problem is to define the correct result of a randomized input.
Determining the number of times to run the code: Simply try it out, if it takes too long reduce the iteration count. You may want to use a code coverage tool to find out what part of your application is actually tested.
Are these kinds of tests well suited for unit tests? Yes.

This might be slightly off-topic, but if you're using .net, there is Pex, which does something similar to randomized testing, but with more intuition by attempting to generate a "random" test case that exercises all of the paths through your code.

Here is my answer to a similar question: Is it a bad practice to randomly-generate test data?. Other answers may be useful as well.
Random testing is a bad practice a
long as you don't have a solution for
the oracle problem, i.e.,
determining which is the expected
outcome of your software given its
input.
If you solved the oracle problem, you
can get one step further than simple
random input generation. You can
choose input distributions such that
specific parts of your software get
exercised more than with simple
random.
You then switch from random testing to
statistical testing.
if (a > 0)
// Do Foo
else (if b < 0)
// Do Bar
else
// Do Foobar
If you select a and b randomly in
int range, you exercise Foo 50% of
the time, Bar 25% of the time and
Foobar 25% of the time. It is likely
that you will find more bugs in Foo
than in Bar or Foobar.
If you select a such that it is
negative 66.66% of the time, Bar and
Foobar get exercised more than with
your first distribution. Indeed the
three branches get exercised each
33.33% of the time.
Of course, if your observed outcome is
different than your expected outcome,
you have to log everything that can be
useful to reproduce the bug.

Random testing has the huge advantage that individual tests can be generated for extremely low cost. This is true even if you only have a partial oracle (for example, does the software crash?)
In a complex system, random testing will find bugs that are difficult to find by any other means. Think about what this means for security testing: even if you don't do random testing, the black hats will, and they will find bugs you missed.
A fascinating subfield of random testing is randomized differential testing, where two or more systems that are supposed to show the same behavior are stimulated with a common input. If their behavior differs, a bug (in one or both) has been found. This has been applied with great effect to testing of compilers, and invariably finds bugs in any compiler that has not been previously confronted with the technique. Even if you have only one compiler you can try it on different optimization settings to look for varying results, and of course crashes always mean bugs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js