Patterns for loading default data when unit testing

Patterns for loading default data when unit testing - unit-testing

Looking for some strategies for how you guys are loading default data when doing unit tests.

I use a builder that contains the default values, just like this: http://elegantcode.com/2008/04/26/test-data-builders-refined/. Then the test only specifies the value it cares for:
Customer customer = new CustomerBuilder()
.WithFirstName("this test only cares about a special ' ... first name value");
After reading the other answers, I want to clear that its not for database data. Its to build the instances/data you pass to the classes you are unit testing.
Its a matter of convenience/keeping the tests simple, plenty of times you are testing very specific behavior that depends on 1-3 fields and you don't care about the rest of the fields.

For unit testing I generally don't load data in advance - each test is designed to work against a data source that may or may not already contain existing records, and so each test writes all any records that are needed to complete the test.
When choosing values to submit to the database I use GUIDs (or other random values) whenever possible as it guarantees that values in the database are unique (e.g. if you create someone named "Mr X Y", it is helpful to know that searching for "X" should return only 1 result, and that there is no chance you have chanced on someone else in the database whose last name happens to be Y)
Often when unit testing I'm testing methods that modify data alongside methods that read data, and so my unit tests use the same API (the one being tested) to write to the database. (It's nice if each unit test covers a specific area of functionality, but it's not absolutely necessary)
If the API being tested doesn't have methods to write to the database however, I write my own set of helper functions - the exact structure is going to depend on the data source, but as an example this is where I often use LINQ to SQL.

TDD is about testing a piece of code in isolation. One create an instance of a class with its dependencies (or mocks of them), call the method under test and assert to verify the outcome of the test.
Usually with TDD one starts with a simple test, without data. When data are needed, they are created in the test fixture (the isolated environment where the test is executed) by the test setUp() method and then destroyed by the tearDown() method after the test has been run. Data are not loaded from the database.

Preferred strategy is in-transaction data. Spring offers extensive support (for both JUnit 3 and 4). With this strategy your test begins brand new transaction each time and your data is rolled back at the end of test.
Of course sometimes it's not enough: either data set is too extensive and shared across tests, or multiple transactions are part of the test scope. In that case, I recommend creating shared test data bed that is created before running test suite. There are frameworks for this (dbUnit) but you can also do without them if careful and consistent.
UPD: creating in-transaction data doesn't mean you not need test data, you are likely to end up creating re-usable and shared helper classes to maintain test data in all cases.

I typically have methods like GetCustomer() that return a generic customer. If I need to make the returned customer suite my needs for a particular test, I will simply change the property after it gets returned.
Other times I may pass some configuration information into my GetCustomer() method. For example GetCustomer(string customerType).
I've read expert's opinions that says that each test should contain its own unique data to work with and not try to make the data generic. Even though this may make each test "larger" in size, over all it will make the test clearer because the setup is specific to each test and the goals of each test. I like this advice because I've ran into many cases where trying to make the setup data generic, made things very sloppy very quick.

Related

Unit/integration testing nHibenrate query

Scenario: I need to write a complex nHibernate query, that would return projected DTO, but I want to use TDD approach. The method would look like this:
public PrintDTO GetUsersForPrinting(int userId)
{
Session.QueryOver<User>().//some joins, conditions etc.
//returns projected dto
}
Questions:
Since the most common approach is to use in memory database for this kind of operations. Should I write integration test?
If I am using in memory db can I write Unit tests?
Is one test is enough?
Since my integration test probably will check projection, how should I name it? "GetUserForPrinting_return_correct_DTO" seems too abstract and silly.
I ask because:
There is lots of abstract information about TDD and integration testing, but when it comes to concrete implementation it is very difficult to apply that information.
TDD suggests that integration test should be made of unit tests:

This is not really a very good problem to learn TDD with. I assume you don't already know what the complex query looks like, and you want to use test-driven techniques to drive it out. Awesome :)
But let's see if I can answer your questions.
Yes
any test that includes a real db, whether it is in-memory or on-disk, is not a unit test. A unit test would use a mock db.
Maybe - if you query is complex enough, then no.
testGetUsersForPrinting or getUsersForPrintingTest or similar
Most probably I would drive out the query in a SQL interpreter, not in code. The aim would be to produce a series of integration tests against an in-memory db based on what I learn during this process.
Start from the minimum possible DTO you can think of, and build up from there.
Finally convert the query into nhibernate calls, then make the integration tests pass.
Test-driven, but not really unit-test-driven.
If you are willing to accept maximum TDD discipline and deal with working slower and being more annoyed than usual, you can automate each integration test as you develop it and write code to make it pass. This will mean you are switching frequently among 3 levels of abstraction / editors / environments (direct SQL queries, integration tests, c# code) - I deal with this by setting up techniques to force myself to follow the right steps each time.
This last bit is why this is not a good problem to learn TDD with. You will need a lot of discipline you probably haven't forced yourself to acquire yet!
Good luck.
ok some concrete examples. I would modify your code sample to look like this
public PrintDTO GetUsersForPrinting(int userId, ISession session)
{
var data = session.QueryOver<User>().//some joins, conditions etc.
return data; // or whatever
}
In your unit test you would write
public testDTO()
{
//Arrange
StubSession session = .... setup a stub session, which returns hardcoded values
// Act
PrintDTO users = GetUsersForPrinting(111, session);
// Assert
Assert.That(users.size(), Is.EqualTo(1));
Assert.That(users.get(0).userId, Is.EqualTo(111));
}
In your integration test, you would use a real db, and your session object would actually connect to it, and the queries would be resolved against that db
Arrange-Act-Assert is a standard method for organizing unit tests.
Generally you want as few Asserts as possible in a unit test. And you will have multiple unit tests.
When you are writing a unit test, start by writing the Assert, then fill in the rest to make it compile/get the result you want. Make the test fail first, because then you know you have really delivered something when it passes.
In this example to implement a stub ISession you would derive a local StubSession class (only visible to the test suite) from ISession and just fill in the absolute minimum to get it to compile, and return the minimum data to get the test to pass.
To build up to your whole DTO - assuming you know what you want in your DTO - proceed, as you say in the comments, incrementally. Build up each part of your DTO
a piece at a time, add a unit test for each piece.
Keeping track of this is another piece of TDD discipline.
Set yourself up with a TODO list - just a simple text file, or possibly a lengthy comments at the start of your test suite. List all the things you want to test e.g. zero results, one result, two results, 20 results. User id, whatever other pieces of information you need to have.
If you are doing a complex query across tables or whatever add an todo item for each join, each part of the where clause, etc.
Add items for ordering and paging etc if you are using those.
Pick the simplest things first. Only do one small thing (in a single red-green-refactor cycle) at a time. As you work through your list, you might want to break items up into smaller pieces, or you might think of additional things you need to do. Add them to the TODO list rather than working directly on them.
In this particular case I would swap - after each red-green-refactor cycle - into the SQL environment and/or the sqlite integration test to work out how to make the next piece work. I guess this is a sort of step between red and green - choose what you will test next, write the test (which fails obviously), fiddle around in SQL until you know how to make it pass, write the nHibernate calls to make your test green, then refactor.
Be aware some of the things you list might run out not to be necessary, or take too long, etc. It's good to write them down still, so you know what are not doing as well as what you are doing. Keep focused on your goal.
I tend to also develop a list of "smells" and/or refactorings that I can see I will want to do but am not quite ready for this cycle. Remember to minimise duplication/refactor your tests as well as your SUT (System Under Test).
It's a doing rather then seeing thing. The list of what unit tests you end up with, and the code they exercise, is not a very good description of the journey. Kent Beck's original TDD book is slim and will give you some good overall pointers, but not really about constructing queries.
Does any of that help?

Since the most common approach is to use in memory database for this kind of operations. Should I write integration test?
Using in memory database still is an integration test (because it actually tests if your query generates correct SQL and execute it against a database, see).
If I am using in memory db can I write Unit tests?
No, it would be an integration test
Is one test is enough?
Probably not, you should check each condition of your query, for example one test per one where clause, one for paging and one for sorting if applicable.
Since my integration test probably will check projection, how should I
name it? "GetUserForPrinting_return_correct_DTO" seems too abstract
and silly.
GivenUserForPrinting_WhenGetUserForPrinting_ThenMapToDTO would be a better naming

How can I determine the name of my unit test before its execution?

I was using MSTest and all was fine. Not long ago, I needed to write a large number of data driven unit tests.
Moreover, I needed to know the name of the test just before I run it, so I could populate the data sources with the correct parameters (that were fetched from an external remote service).
Nowhere in MSTest could I find a way to get the name of the tests that are about to run before their actual execution. At this point it was, of course, already too late, since the data sources were already populated.
What I need is to know the names of the test that are about to execute so I could configure their data sources in advance, before their execution.
Somebody suggested I "check out NUnit". I am completely clueless about NUnit. For now I have started reading its documentation but am still at a loss. Have you any advice?

If you really need the test's name -- It's not well documented, but NUnit exposes a feature that let's you get access to the current test information:
namespace NUnitOutput.Example
{
using NUnit.Framework;
[TestFixture]
public class Demo
{
[Test]
public void WhatsMyName()
{
Console.WriteLine(TestContext.CurrentContext.Test.FullName);
Console.WriteLine(TestContext.CurrentContext.Test.Name);
}
}
}
Provides:
NUnitOutput.Example.Demo.WhatsMyName
WhatsMyName
Note this feature isn't guaranteed to implemented by custom TestRunners, like ReSharper. I have tested this in NUnit 2.5.9 (nunit.exe and nunit-console.exe)
However, re-reading your question I think you should check out is the TestCaseSource or TestCase attribute that can be used to parameterize your tests.

If I'm understanding your problem correctly, you want to get the name of the currently-running test so that you can use it as a key to look up a set of data with which to populate the data sources used by the code under test. Is that right?
If that's the case then I don't think you need to look for special functionality in your unit testing framework. Why not use the Reflection API to fetch the name of the currently-executing method? System.Reflection.MethodBase.GetCurrentMethod() will get you a MethodBase object representing the method, and that has a Name property.
However, I'd suggest that using the method name as a key for looking up the appropriate test data is a bad idea. You'd be coupling the name of the method to your data set, and that seems like a recipe for fragile code to me. You need to be remain free to refactor your code and the names of methods without worrying about whether that will break the database lookup behind-the-scenes.
As an alternative, why not consider creating a custom Attribute that you can use to mark those test methods that need a database lookup, and using a property on that attribute to hold the key?

For things like this you should rely on a fixture to initialize the state you want before you run the test.
The simplest way which works in (any) testing framework is to create a fixture which loads any data given a data identifier (string). Then in each test case you just provide the test string for data lookup for the data you want in that test.
Aside from this, it's not recommended to have unit tests access files and other external resources because it means slower unit tests and higher probability of failure (as you're relying on something outside the in-memory code). This of course depends on the amount of data you have and the type of testing you're doing, but I generally have the data compiled-in for unit tests.

How far should I go with unit testing?

I'm trying to unit test in a personal PHP project like a good little programmer, and I'd like to do it correctly. From what I hear what you're supposed to test is just the public interface of a method, but I was wondering if that would still apply to below.
I have a method that generates a password reset token in the event that a user forgets his or her password. The method returns one of two things: nothing (null) if everything worked fine, or an error code signifying that the user with the specified username doesn't exist.
If I'm only testing the public interface, how can I be sure that the password reset token IS going in the database if the username is valid, and ISN'T going in the database if the username is NOT valid? Should I do queries in my tests to validate this? Or should I just kind of assume that my logic is sound?
Now this method is very simple and this isn't that big of a deal - the problem is that this same situation applies to many other methods. What do you do in database centric unit tests?
Code, for reference if needed:
public function generatePasswordReset($username)
{
$this->sql='SELECT id
FROM users
WHERE username = :username';
$this->addParam(':username', $username);
$user=$this->query()->fetch();
if (!$user)
return self::$E_USER_DOESNT_EXIST;
else
{
$code=md5(uniqid());
$this->addParams(array(':uid' => $user['id'],
':code' => $code,
':duration' => 24 //in hours, how long reset is valid
));
//generate new code, delete old one if present
$this->sql ='DELETE FROM password_resets WHERE user_id=:uid;';
$this->sql.="INSERT INTO password_resets (user_id, code, expires)
VALUES (:uid, :code, now() + interval ':duration hours')";
$this->execute();
}
}

The great thing about unit testing, for me at least, is that it shows you where you need to refactor. Using your sample code above, you've basically got four things happening in one method:
//1. get the user from the DB
//2. in a big else, check if user is null
//3. create a array containing the userID, a code, and expiry
//4. delete any existing password resets
//5. create a new password reset
Unit testing is also great because it helps highlight dependencies. This method, as shown above, is dependent on a DB, rather than an object that implements an interface. This method interacts with systems outside its scope, and really could only be tested with an integration test, rather than a unit test. Unit tests are for ensuring the working/correctness of a unit of work.
Consider the Single Responsibility Principle: "Do one thing". It applies to methods as well as classes.
I'd suggest that your generatePasswordReset method should be refactored to:
be given a pre-defined existing user object/id. Do all those sanity checks outside of this method. Do one thing.
put the password reset code into its own method. That would be a single unit of work that could be tested independently of the SELECT, DELETE and INSERT.
Make a new method that could be called OverwriteExistingPwdChangeRequests() which would take care of the DELETE + INSERT.

The reason this function is more difficult to unit test is because the database update is a side-effect of the function (i.e. there is no explicit return for you to test).
One way of dealing with state updates on remote objects like this is to create a mock object that provides the same interface as the DB (i.e. it looks identical from the perspective of your code). Then in your test you can check the state changes within this mock object and confirm you have received what you should.

You can break it down some more, that function is doing alot which makes testing it a bit tricky, not impossible but tricky. If on the other hand you pulled out some smaller extra functions (getUserByUsername, deletePasswordByUserID, addPasswordByUserId, etc. Then you can test those easily enough once and know they work so you don't have to test them again. This way you test the lower down calls making sure they are sound so you don't have to worry about them further up the chain. Then for this function all you need to do is throw it a user that does not exist and make sure it comes back with a USER_DOESNT_EXIST error then one where a user does exist (this is where you test DB comes in). The inner works have already been exercised else where (hopefully).

Unit tests serve the purpose of verifying that a unit works. If you care to know whether a unit works or not, write a test. It's that simple. Choosing to write a unit test or not shouldn't be based on some chart or rule of thumb. As a professional it's your responsibility to deliver working code, and you can't know if it's working or not unless you test it.
Now, that doesn't mean you write a test for each and every line of code. Nor does it necessarily mean you write a unit test for every single function. Deciding to test or not test a particular unit of work boils down to risk. How willing are you to risk that your piece of untested code gets deployed?
If you're asking yourself "how do I know if this functionality works", the answer is "you don't, until you have repeatable tests that prove it works".

In general one might "mock" the object you are invoking, verifying that it receives the expected requests.
In this case I'm not sure how helpful that is, you amost end up writing the same logic twice ... we thought we sent "DELETE from password" etc. Oh look we did!
Hmmm, what did we actually check. If the string was badly formed, we woudln't know!
It may be against the letter of Unit testing law, but I would instead test these side-effects by doing separate queries against the database.

Testing the public interface is necessary, but not sufficient. There are many philosophies on how much testing is required, and I can only give my own opinion. Test everything. Literally. You should have a test that verifies that each line of code has been exercised by the test suite. (I only say 'each line' because I'm thinking of C and gcov, and gcov provides line-level granularity. If you have a tool that has finer resolution, use it.) If you can add a chunk of code to your code base without adding a test, the test suite should fail.

Databases are global variables. Global variables are public interfaces for every unit that uses them. Your test cases must therefore vary inputs not only on the function parameter, but also the database inputs.

If your unit tests have side-effects (like changing a database) then they have become integration tests. There is nothing wrong in itself with integration tests; any automated testing is good for the quality of your product. But integration tests have a higher maintenance cost because they are more complex and are easier to break.
The trick is therefore to minimize the code that can only be tested with side effects. Isolate and hide the SQL queries in a separate MyDatabase class which does not contain any business logic. Pass an instance of this object to your business logic code.
Then when you unit test your business logic, you can substitute the MyDatabase object with a mock instance which is not connected to a real database, and which can be used to verify that your business logic code uses the database correctly.
See the documentation of SimpleTest (a php mocking framework) for an example.

Database data needed in integration tests; created by API calls or using imported data?

This question is more or less programming language agnostic. However as I'm mostly into Java these days that's where I'll draw my examples from. I'm also thinking about the OOP case, so if you want to test a method you need an instance of that methods class.
A core rule for unit tests is that they should be autonomous, and that can be achieved by isolating a class from its dependencies. There are several ways to do it and it depends on if you inject your dependencies using IoC (in the Java world we have Spring, EJB3 and other frameworks/platforms which provide injection capabilities) and/or if you mock objects (for Java you have JMock and EasyMock) to separate a class being tested from its dependencies.
If we need to test groups of methods in different classes* and see that they are well integration, we write integration tests. And here is my question!
At least in web applications, state is often persisted to a database. We could use the same tools as for unit tests to achieve independence from the database. But in my humble opinion I think that there are cases when not using a database for integration tests is mocking too much (but feel free to disagree; not using a database at all, ever, is also a valid answer as it makes the question irrelevant).
When you use a database for integration tests, how do you fill that database with data? I can see two approaches:
Store the database contents for the integration test and load it before starting the test. If it's stored as an SQL dump, a database file, XML or something else would be interesting to know.
Create the necessary database structures by API calls. These calls are probably split up into several methods in your test code and each of these methods may fail. It could be seen as your integration test having dependencies on other tests.
How are you making certain that database data needed for tests is there when you need it? And why did you choose the method you choose?
Please provide an answer with a motivation, as it's in the motivation the interesting part lies. Remember that just saying "It's best practice!" isn't a real motivation, it's just re-iterating something you've read or heard from someone. For that case please explain why it's best practice.
*I'm including one method calling other methods in (the same or other) instances of the same class in my definition of unit test, even though it might technically not be correct. Feel free to correct me, but let's keep it as a side issue.

I prefer creating the test data using API calls.
In the beginning of the test, you create an empty database (in-memory or the same that is used in production), run the install script to initialize it, and then create whatever test data used by the database. Creation of the test data may be organized for example with the Object Mother pattern, so that the same data can be reused in many tests, possibly with minor variations.
You want to have the database in a known state before every test, in order to have reproducable tests without side effects. So when a test ends, you should drop the test database or roll back the transaction, so that the next test could recreate the test data always the same way, regardless of whether the previous tests passed or failed.
The reason why I would avoid importing database dumps (or similar), is that it will couple the test data with the database schema. When the database schema changes, you would also need to change or recreate the test data, which may require manual work.
If the test data is specified in code, you will have the power of your IDE's refactoring tools at your hand. When you make a change which affects the database schema, it will probably also affect the API calls, so you will anyways need to refactor the code using the API. With nearly the same effort you can also refactor the creation of the test data - especially if the refactoring can be automated (renames, introducing parameters etc.). But if the tests rely on a database dump, you would need to manually refactor the database dump in addition to refactoring the code which uses the API.
Another thing related to integration testing the database, is testing that upgrading from a previous database schema works right. For that you might want to read the book Refactoring Databases: Evolutionary Database Design or this article: http://martinfowler.com/articles/evodb.html

In integration tests, you need to test with real database, as you have to verify that your application can actually talk to the database. Isolating the database as dependency means that you are postponing the real test of whether your database was deployed properly, your schema is as expected and your app is configured with the right connection string. You don't want to find any problems with these when you deploy to production.
You also want to test with both precreated data sets and empty data set. You need to test both path where your app starts with an empty database with only your default initial data and starts creating and populating the data and also with a well-defined data sets that target specific conditions you want to test, like stress, performance and so on.
Also, make sur that you have the database in a well-known state before each state. You don't want to have dependencies between your integration tests.

Why are these two approaches defined as being exclusively?
I can't see any viable argument for
not using pre-existing data sets, especially particular data that has
caused problems in the past.
I can't
see any viable argument for not
programmatically extending that data with
all the possible conditions that
you can imagine causing problems and even a
bit of random data for integration
testing.
In modern agile approaches, Unit tests are where it really matters that the same tests are run each time. This is because unit tests are aimed not at finding bugs but at preserving the functionality of the app as it is developed, allowing the developer to refactor as needed.
Integration tests, on the other hand, are designed to find the bugs you did not expect. Running with some different data each time can even be good, in my opinion. You just have to make sure your test preserves the failing data if you get a failure. Remember, in formal integration testing, the application itself will be frozen except for bug fixes so your tests can be change to test for the maximum possible number and kinds of bugs. In integration, you can and should throw the kitchen sink at the app.
As others have noted, of course, all this naturally depends on the kind of application that you are developing and the kind of organization you are in, etc.

It sounds like your question is actually two questions. Should you exclude the database from your testing? When you do a database, then how should you generate the data in the database?
When possible I prefer to use an actual database. Frequently the queries (written in SQL, HQL, etc.) in CRUD classes can return surprising results when confronted with an actual database. It's better to flush these issues out early on. Often developers will write very thin unit tests for CRUD; testing only the most benign cases. Using an actual database for your testing can test all kinds corner cases you may not have even been aware of.
That being said there can be other issues. After each test you want to return your database to a known state. It my current job we nuke the database by executing all the DROP statements and then completely recreating all the tables from scratch. This is extremely slow on Oracle, but can be very fast if you use an in memory database like HSQLDB. When we need to flush out Oracle specific issues we just change the database URL and driver properties and then run against Oracle. If you don't have this kind of database portability then Oracle also has some kind of database snapshot feature which can be used specifically for this purpose; rolling back the entire database to some previous state. I'm sure what other databases have.
Depending on what kind of data will be in your database the API or the load approach may work better or worse. When you have highly structured data with many relations, APIs will make your life easier my making the relations between your data explicit. It will be harder for you to make a mistake when creating your test data set. As mentioned by other posters refactoring tools can take care of some of the changes to structure of your data automatically. Often I find it useful to think of API generated test data as composing a scenario; when a user/system has done steps X, Y Z and then tests will go from there. These states can be achieved because you can write a program that calls the same API your user would use.
Loading data becomes much more important when you need large volumes of data, you have few relations between within your data or there is consistency in the data that can not be expressed using APIs or standard relational mechanisms. At one job that at worked at my team was writing the reporting application for a large network packet inspection system. The volume of data was quite large for the time. In order to trigger a useful subset of test cases we really needed test data generated by the sniffers. This way correlations between the information about one protocol would correlate with information about another protocol. It was difficult to capture this in the API.
Most databases have tools to import and export delimited text files of tables. But often you only want subsets of them; making using data dumps more complicated. At my current job we need to take some dumps of actual data which gets generated by Matlab programs and stored in the database. We have tool which can dump a subset of the database data and then compare it with the "ground truth" for testing. It seems our extraction tools are being constantly modified.

I've used DBUnit to take snapshots of records in a database and store them in XML format. Then my unit tests (we called them integration tests when they required a database), can wipe and restore from the XML file at the start of each test.
I'm undecided whether this is worth the effort. One problem is dependencies on other tables. We left static reference tables alone, and built some tools to detect and extract all child tables along with the requested records. I read someone's recommendation to disable all foreign keys in your integration test database. That would make it way easier to prepare the data, but you're no longer checking for any referential integrity problems in your tests.
Another problem is database schema changes. We wrote some tools that would add default values for columns that had been added since the snapshots were taken.
Obviously these tests were way slower than pure unit tests.
When you're trying to test some legacy code where it's very difficult to write unit tests for individual classes, this approach may be worth the effort.

I do both, depending on what I need to test:
I import static test data from SQL scripts or DB dumps. This data is used in object load (deserialization or object mapping) and in SQL query tests (when I want to know whether the code will return the correct result).
Plus, I usually have some backbone data (config, value to name lookup tables, etc). These are also loaded in this step. Note that this loading is a single test (along with creating the DB from scratch).
When I have code which modifies the DB (object -> DB), I usually run it against a living DB (in memory or a test instance somewhere). This is to ensure that the code works; not to create any large amount of rows. After the test, I rollback the transaction (following the rule that tests must not modify the global state).
Of course, there are exceptions to the rule:
I also create large amount of rows in performance tests.
Sometimes, I have to commit the result of a unit test (otherwise, the test would grow too big).

I generally use SQL scripts to fill the data in the scenario you discuss.
It's straight-forward and very easily repeatable.

This will probably not answer all your questions, if any, but I made the decision in one project to do unit testing against the DB. I felt in my case that the DB structure needed testing too, i.e. did my DB design deliver what is needed for the application. Later in the project when I feel the DB structure is stable, I will probably move away from this.
To generate data I decided to create an external application that filled the DB with "random" data, I created a person-name and company-name generators etc.
The reason for doing this in an external program was:
1. I could rerun the tests on by test modified data, i.e. making sure my tests were able to run several times and the data modification made by the tests were valid modifications.
2. I could if needed, clean the DB and get a fresh start.
I agree that there are points of failure in this approach, but in my case since e.g. person generation was part of the business logic generating data for tests was actually testing that part too.

Our team confront the same question recently.
Before, we were using specflow to do integration testing. With specflow, QA can write each test case inside which populating necessary test data to DB.
Now, QA want to use postman to test API, how can they populate the data? One solution is creating Apis for populating them. Another is sync historical data from Prod to test env.
Will update my answer once we try different solutions and decide which one to go.

How do you unit test business applications?

How are people unit testing their business applications? I've seen a lot of examples of unit testing with "simple to test" examples. Ex. a calculator. How are people unit testing data-heavy applications? How are you putting together your sample data? In many cases, data for one test may not work at all for another test which makes it hard to just have one test database?
Testing the data access portion of the code is fairly straightforward. It's testing out all the methods that work against the data that seem to be hard to test. For example, imagine a posting process where there is heavy data access to determine what is posted, numbers are adjusted, etc. There are a number of interim steps that occur (and need to be tested) along with tests afterwards that ensure the posting was successful. Some of those steps may actually be stored procedures.
In the past I've tried inserting the test data in a test database, then running the test, but honestly it's pretty painful to write this kind of code (and error prone). I've also tried just building a test database up front and rolling back the changes. That works OK but in a number of places you can't easily do this either (and many people would say that's integration testing; so be it, I still need to be able to test this somehow).
If the answer is that there isn't a nice way of handling this and it currently just sort of sucks, that would be useful to know as well.
Any thoughts, ideas, suggestions, or tips are appreciated.

My automated functional tests usually follow one of two patters:
Database Connected Tests
Mock Persistence Layer Tests
Database Connected Tests
When I have automated tests that are connected to the database, I usually make a single test database template that has enough data for all the tests. When the automated tests are run, a new test database is generated from the template for every test. The test database has to be constantly re-generated because test will often change the data. As tests are added, I usually append more data to the test database template.
There are some nice advantages to this testing method. The obvious advantage is that the tests also exercise your schema. Another advantage is that after setting up the initial tests, most new tests will be able to re-use the existing test data. This makes it easy to add more tests.
The downside is that the test database will become unwieldy. Because data will usually be added one test at time, it will be inconsistent and maybe even unrealistic. You will also end up cursing the person who setup the test database when there is a significant database schema change (which for me usually means I end up cursing myself).
This style of testing obviously doesn't work if you can't generate new test databases at will.
Mock Persistence Layer Tests
For this pattern, you create mock objects that live with the test cases. These mock objects intercept the calls to the database so that you can programmatically provide the appropriate results. Basically, when the code you're testing calls the findCustomerByName() method, your mock object is called instead of the persistence layer.
The nice thing about using mock object tests is that you can get very specific. Often times, there are execution paths that you simply can't reach in automated tests w/o mock objects. They also free you from maintaining a large, monolithic set of test data.
Another benefit is the lack of external dependencies. Because the mock objects simulate the persistence layer, your tests are no longer dependent on the database. This is often the deciding factor when choosing which pattern to choose. Mock objects seem to get more traction when dealing with legacy database systems or databases with stringent licensing terms.
The downside of mock objects is that they often result in a lot of extra test code. This isn't horrible because almost any amount of testing code is cheap when amortized over the number of times you run the test, but it can be annoying to have more test code then production code.

I have to second the comment by #Phil Bennett as I try to approach these integration tests with a rollback solution.
I have a very detailed post about integration testing your data access layer here
I show not only the sample data access class, base class, and sample DB transaction fixture class, but a full CRUD integration test w/ sample data shown. With this approach you don't need multiple test databases as you can control the data going in with each test and after the test is complete the transactions are all rolledback so your DB is clean.
About unit testing business logic inside your app, I would also second the comments by #Phil and #Mark because if you mock out all the dependencies your business object has, it becomes very simple to test your application logic one entity at a time ;)
Edit: So are you looking for one huge integration test that will verify everything from logic pre-data base / stored procedure run w/ logic and finally a verification on the way back? If so you could break this out into 2 steps:
1 - Unit test the logic that happens before the data is pushed
into your data access code. For
example, if you have some code that
calculates some numbers based on
some properties -- write a test that
only checks to see if the logic for
this 1 function does what you asked
it to do. Mock out any dependancy
on the data access class so you can
ignore it for this test of the
application logic alone.
2 - Integration test the logic that happens once you take your
manipulated data (from the previous
method we unit tested) and call the
appropriate stored procedure. Do
this inside a data specific testing
class so you can rollback after it's
completed. After your stored
procedure has run, do a query
against the database to get your
object now that we have done some
logic against the data and verify it
has the values you expected
(post-stored procedure logic /etc )
If you need an entry in your database for the stored procedure to run, simply insert that data before you run the sproc that has your logic inside it. For example, if you have a product that you need to test, it might require a supplier and category entry to insert so before you insert your product do a quick and dirty insert for a supplier and category so your product insert works as planned.

It depends on what you're testing. If you're testing a business logic component -- then its immaterial where the data is coming from and you'd probably use a mock or a hand rolled stub class that simulates the data access routine the component would have called in the wild. The only time I mess with the data access is when I'm actually testing the data access components themselves.
Even then I tend to open a DB transaction in the TestFixtureSetUp method (obviously this depends on what unit testing framework you might be using) and rollback the transaction at the end of the test suite TestFixtureTeardown.

Mocking Frameworks enable you to test your business objects.
Data Driven tests often end up becoming more of a intergration test than a unit test, they also carry with them the burden of managing the state of a data store pre and post execution of the test and the time taken in connecting and executing queries.
In general i would avoid doing unit tests that touch the database from your business objects. As for Testing your database you need a different stratergy.
That being said you can never totally get away from data driven testing only limiting the amout of tests that actually need to invoke your back end systems.

It sounds like you might be testing message based systems, or systems with highly parameterised interfaces, where there are large numbers of permutations of input data.
In general all the rules of standard unti testing still hold:
Try to make the units being tested as small and discrete as possible.
Try to make tests independant.
Factor code to decouple dependencies.
Use mocks and stubs to replace dependencies (like dataaccess)
Once this is done you will have removed a lot of the complexity from the tests, hopefully revealing good sets of unit tests, and simplifying the sample data.
A good methodology for then compiling sample data for test that still require complex input data is Orthogonal testing, or see here.
I've used that sort of method for generating test plans for WCF and BizTalk solutions where the permutations of input messages can create multiple possible execution paths.

For lots of different runs over the same logic but with different data you can use CSV, as many columns as you like for the input and the last for the output etc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js