Tips for testing data intensive legacy application

Tips for testing data intensive legacy application - unit-testing

I'm working on a very large, data-intensive legacy application. Both the code base & database are massive in scale. A great deal of the business logic is spread across all of the tiers including in stored procedures.
Does anybody have any suggestions on how to begin applying "unit" tests (technically integration tests because they need to test across tiers for a single aspect of almost any given process) into the application in an efficient way? The current architecture does not easily support any type of injection or mocking. New code is being written to facilitate testing, but what about the legacy code? Because of the strong dependency on the data itself and business logic in the database, I'm currently using inline sql to find data to use for testing but these are time consuming. Creating views and/or stored procedures will not suffice.
What approaches have you taken (if applicable)? What worked? What didn't & why? Any suggestions would be appreciated. Thanks.

Get a copy of Working Effectively with Legacy Code by Michael Feathers. It is full of useful advice for working with large, untested codebases.
Another good book is Object Oriented Reengineering Patterns. Most of the book is not specific to object-oriented software. The full text is available for free download in PDF format.
From my own experience: try to...
Automate the build and deployment
Get the database schema into version control, if it isn't yet. Usually databases include reference data that the transactional code needs to exist before it can work. Get this under version control too. Tools like dbdeploy can help you easily rebuild a schema and reference data from a sequence of deltas.
Install a version of the database (and any other infrastructure services) onto your development workstation. This will let you work on the database without continually having to go through DBAs. It's also faster than using a schema on a shared server in a remote datacentre. All major commercial database servers have free (as in beer) development versions that work on Windows (if you're stuck in the unenviable situation of developing on Windows and deploying on Unix).
Before starting work on an area of the code, write end-to-end tests that roughly cover the behaviour of the area you're working on. An end-to-end test should exercise the system from outside -- by controlling its user interface or interacting through network services -- so you won't need to change the code to put it into place. It will act as an (imperfect) regression test and give you more confidence to refactor the internals of the system towards a structure that is easier to unit test.
If there are manual test plans, read them and see what can be automated. Most manual test plans are almost entirely scripted and so are low-hanging fruit for automation
Once you've got end-to-end tests coverage, refactor the code into more loosely coupled units as you modify and/or extend it. Surround those units with unit tests.
Things to avoid:
Copying data from the production database into the environment you use for automated testing. This will make your tests unpredictable. Sure, run the system against a copy of production data, but use that for exploratory testing, not regression testing.
Rolling back transactions at the end of tests to isolate tests from one another. This will not test behaviour that only happens when transactions are committed, and will throw away data that is valuable for diagnosing test failures. Instead, tests should ensure the database is in a known initial state when they start.
Creating a "tiny" data set for tests to run against. This makes tests hard to understand because they cannot be read as a single unit. The "tiny" data set soon grows very large as you add tests for different scenarios. Instead, tests can insert data into the database to set up the test-fixture.

“Testing Legacy Application Modernization,” highlights:
High level overview of how tests are created in AscentialTest
Ways to convert the legacy objects to the new platform Components of Object definition
How to ensure that the modernized version of the application produces the same results
For more details on usage of testing legacy application, do check here:
http://application-management.cioreview.com/whitepaper/testing-legacy-application-modernization-wid-529.html

As mentioned before, there are some very good books out there. Highly recommended to take a look at Working Effectively with Legacy Code.
Something you could do is following a data driven approach, observe your application and introduce tests where you have more “pain”. A semi-deterministic approach you might find useful: https://link.medium.com/zY9Tysfne9

Related

How to do unit testing in Microsoft Dynamics AX 2012 in a real world project

Dynamics AX 2012 comes with unit testing support.
To have meaningful tests some test data needs to be provided (stored in tables in the database).
To get a reproducable outcome of the unit tests we need to have the same data stored in the tables every time the tests are run. Now the question is, how can we accomplish this?
I learned that there is the possibility of setting the isolation level for the TestSuite to SysTestSuiteCompanyIsolateClass. This will create an empty company and delete the company after the tests have been run. In the setup() method I can fill my testdata into the tables with insert statements. This works fine for small scenarios but becomes cumbersome very fast if you have a real life project.
I was wondering if there is anyone out there with a practical solution of how to use the X++ Unit Test Framework in a real world scenario. Any input is very much appreciated.

I agree that creating test data in a new and empty company only works for fairly trivial scenarios or scenarios where you implemented the whole data structure yourself. But as soon as existing data structures are needed, this approach can become very time consuming.
One approach that worked well for me in the past is to run unit tests in a existing company that already has most of the configuration data (e.g. financial setup, inventory setup, ...) needed to run the test. The test itself runs in a ttsBegin - ttsAbort block so that the unit test does not actually create any data.
Another approach is to implement data provider methods that are test agnostic, but create data that is often used in unit tests (e.g. a method that creates a product). It takes some time to create a useful set of data provider methods, but once they exist, writing unit tests becomes a lot faster. See SysTest part V.: Test execution (results, runners and listeners) on how Microsoft uses a similar approach (or at least they used to back in 2007 for AX 4.0).
Both approaches can also be combined, you would call the data provider methods inside the ttsBegin - ttsAbort block to create the needed data only for the unit test.
Another useful method is to use doInsert or doUpdate to create your test data, especially if you are only interested in a few fields and do not need to create a completely valid record.

I think that the unit test framework was an afterthought. In order to really use it, Microsoft would have needed to provide unit test classes, then when you customize their code, you also customize their unit tests.
So without that, you're essentially left coding unit tests that try and encompass base code along with your modifications, which is a huge task.
Where I think you can actually use it is around isolated customizations that perform some function, and aren't heavily built on base code. And also with customizations that are integrations with external systems.

Well, from my point of view, you will not be able to leverage more than what you pointed from the standard framework.
What you can do is more around release management. You can setup an integration environment with the targeted data and push your nightbuild model into this environmnet at the end of the build process and then run your tests.
Yes, it will need more effort to set it up and to maintain but it's the only solution I've seen untill now to have a large and consistent set of data to run unit or integration tests on.

To have meaningful tests some test data needs to be provided (stored
in tables in the database).
As someone else already indicated - I found it best to leverage an existing company for data. In my case, several existing companies.
To get a reproducable outcome of the unit tests we need to have the
same data stored in the tables every time the tests are run. Now the
question is, how can we accomplish this?
We have built test helpers, that help us "run the test", automating what a person would do - give you have architeced your application to be testable. In essence our test class uses the helpers to run the test, then provides most of the value in validating the data it created.
I learned that there is the possibility of setting the isolation level
for the TestSuite to SysTestSuiteCompanyIsolateClass. This will create
an empty company and delete the company after the tests have been run.
In the setup() method I can fill my testdata into the tables with
insert statements. This works fine for small scenarios but becomes
cumbersome very fast if you have a real life project.
I did not find this practical in our situation, so we haven't leveraged it.
I was wondering if there is anyone out there with a practical solution
of how to use the X++ Unit Test Framework in a real world scenario.
Any input is very much appreciated.
We've been using the testing framework as stated above and it has been working for us. the key is to find the correct scenarios to test, also provides a good foundation for writing testable classes.

Do atomic tests make sense in dynamically created environments?

We´re building a product that allows users to create custom databases and store data within those DBs (WebApp).
Our issue for testing of the frontend (coffeescript) is that every test should be atomic but that would require setting up a DB for seeing if an item within that DB can be created and persists or to see how changes in a DB affect items.
Essentially, the issue is that the setup code needed to get to the item tests basically sets up a new DB and therefore equals the code that tests setting up a new DB.
There are two approaches and we´re torn on which to use:
1) Create and tear down a new DB with each group of tests
(+) Sorta Atomic (still fails if setting up a DB fails)
(-) Takes a lot of time to execute
(-) Tons of surounding code
(-) No way to explore the created environment
(-) Messy on errors, everything fails
2) Do the setup step by step as seperate tests depending on each other, cleanup routine at beginning of a test
(+) The created environment can be accessed via the UI (not automatically torn down)
(+) Step by step testing, less overall/repetitive code
(-) Tests depended on each other (messy)
(-) Somewhat overall messy
We´re wondering therefore if the golden rule that tests should be atomic makes sense in such a dynamic environment?

Basically, what you are talking about is Integration tests. These are different from Unit Tests. Examples of integration test would be Automated UI tests or Coded UI tests. In most of the projects I've worked on we've had both types of tests and I strongly encourage you to have both types in your project too.
The philosophy behind both these tests is slightly different.
Unit Tests are meant to test isolated bits of functionality.
They are meant to be very fast.
A developer should be able to run them all on their machine in a reasonable amount of time.
There are various consequences of this philosophy.
Because unit test is testing an isolated bit of functionality, you should use mocks and stubs to isolate the rest of the environment and only focus on tiny bits of functionality.
The isolation helps your "design thinking" while writing these tests. In fact this is the reason why the unit tests are required to be fast, because a developer is actively and constantly changing the code and unit tests as part of the design and redesign process. There should be very low overhead to set up, change and run the unit tests. I should be able to ignore everything other than the problem I am trying to solve and quickly iterate and reiterate my designs and tests. This is the idea behind TDD and its claim to help write good testable code. If you are spending a long time trying to set up an overly complex unit test then you have to start reconsidering your design.
The fast nature means that you could run these as part of your Continuous Integration build.
The disadvantage is that because you are testing each functionality in isolation you don't know if they will all work together as a whole. Each time you write a mock, you are implicitly baking in an assumption about how the rest of the system works and that the rest of the system is currently working as it is meant to (i.e nothing else is broken as part of your deployment or running or patching of the OS etc.)
Integration Tests are meant to test the functionality from end to end. You try NOT to mock out or isolate any part of the system.
There are again various consequence of this philosophy. Note that there is no requirement for integration tests to be fast.
Integration tests, by their very nature need to run after your full deployment (as opposed to unit tests which can be run as soon as your code compiles).
Because they take longer, you don't run them as part of your CI environment, but you still need to run them regularly. We usually run them as part of our nightly builds. Or you can run it twice daily etc.
Because the integration tests take a black box approach to the whole system, it doesn't really help you with you "design thinking" about how to actually build the system. But it does help your thinking about the specifications of the system as a whole. i.e What the system should do, not how it should do something.
Note that in both cases the rule of tests being atomic still applies. Each test is different from other tests. This way when a test fails you can be sure about all the conditions that are causing it to fail and concentrate on only fixing that. It's just that an integration test touches as many parts your system as possible.
To give you an example on our current project.
Lets say we need to write a bit of functionality that requires us to add a new table to the DB and bring it through all the layers to show it in the UI.
We start by creating our business logic classes, domain classes, write the appropriate web service, build view models, modify the database etc. While doing each of these we write unit tests to test the code we are currently writing. So when building the business logic classes, we mock out everything else to ensure that the logic in the class is valid (for example, clients over 60 years old get a 50% discount on their car insurance etc.)
Once we do that, we now need to update our deployment scripts / packages etc. to be able to deploy it. i.e update the database creation SQL scripts and the database alteration SQL scripts etc. (In your case this will be complex process).
Now we write integration tests. In this case we might test from SQL Server to Web Service. There is a SQL Integration test base class which contains the set up and tear down method for each test. In the set up we create a brand new database using our sql deployment scripts. Each test also specifies a test data sql script. So for example this test data script might insert a new record into the client table whose age is 70 years. We run this script as part of the "Arrange" of our test. Then make a web service call to search for clients older than 60. This is the "Act" part of the test and from the result, we check to make sure that we only get back the user we've inserted into the DB. At the end of the test, the database is deleted. We've caught bugs here when the columns in SQL database aren't nullable or the datetime columns overflow because the default minimum datetime in .Net is a different size from SQL server's minimum datetime.
Some functionality requires us to interact with an Oracle database. For example, if a new record is added to Oracle, then a trigger/db procedure kicks off and transfers that record to SQL and then we need to bring it up the layers. In this case we have an OracleSQL integration test base class. As you might have guessed, this follows a simliar pattern, but creates both Oracle and SQL dbs inserts test data into Oracle and blows them both away at the end of the test.
The developers usually pick the Web service layer for writing their integration tests. The testers on the other hand use UI automation tools to make sure that the data is actually showing up on screen. For example they will record a test that goes to web page, clicks search button, puts "60" into the age box, clicks the search button etc. That test might leverages the same test data sql script that inserts test data that the developer wrote (or the testing team might come to the developer and ask help crafting sql scripts to insert whatever highly convoluted data they can think of). But the point is, once the test data insertion script is created, it leverages the same underlying system to blow away the whole db, create a new one, insert test data, and run the specified test.
So, to answer your question, you will need two types of tests, unit tests and integration tests. You might have to put in some initial work into creating some base classes or helper methods to create/delete databases, automating your deployment to install/uninstall other components of your system etc. You will have to do this for your final deployment anyway. Integration tests will also be closely related to and dependent on your deployment strategy. This is an advantage and not a disadvantage in my opinion. While it might be painful at first to set it all up, one of the things your integration tests are implicitly testing is your deployment mechanism. If there are any issues with deploying/installing any of the components required by your system, you want to know about it as quickly as possible. Not the day before you are supposed to be deploying to production.
A good suite of tests is invaluable. It also needs to be isolated, rigorous and comprehensive. The tests shouldn't fail when they don't need to but more importantly, they should fail when they need to. And when they do fail, you want them to provide as much information as possible and point you at the exact location of failure. This makes fixing the issue a much easier task. Any time you put into building this test suite will more than pay for itself in no time.

You're not doing atomic tests if you're talking to a database.
You need to mock the database interface and talk to the mock instead. That will be fast, and you'll be able to use the mock to introduce errors that would be difficult using the real database.

Integration testing - can it be done right?

I used TDD as a development style on some projects in the past two years, but I always get stuck on the same point: how can I test the integration of the various parts of my program?
What I am currently doing is writing a testcase per class (this is my rule of thumb: a "unit" is a class, and each class has one or more testcases). I try to resolve dependencies by using mocks and stubs and this works really well as each class can be tested independently. After some coding, all important classes are tested. I then "wire" them together using an IoC container. And here I am stuck: How to test if the wiring was successfull and the objects interact the way I want?
An example: Think of a web application. There is a controller class which takes an array of ids, uses a repository to fetch the records based on these ids and then iterates over the records and writes them as a string to an outfile.
To make it simple, there would be three classes: Controller, Repository, OutfileWriter. Each of them is tested in isolation.
What I would do in order to test the "real" application: making the http request (either manually or automated) with some ids from the database and then look in the filesystem if the file was written. Of course this process could be automated, but still: doesn´t that duplicate the test-logic? Is this what is called an "integration test"? In a book i recently read about Unit Testing it seemed to me that integration testing was more of an anti-pattern?

IMO, and I have no literature to back me on this, but the key difference between our various forms of testing is scope,
Unit testing is testing isolated pieces of functionality [typically a method or stateful class]
Integration testing is testing the interaction of two or more dependent pieces [typically a service and consumer, or even a database connection, or connection to some other remote service]
System integration testing is testing of a system end to end [a special case of integration testing]
If you are familiar with unit testing, then it should come as no surprise that there is no such thing as a perfect or 'magic-bullet' test. Integration and system integration testing is very much like unit testing, in that each is a suite of tests set to verify a certain kind of behavior.
For each test, you set the scope which then dictates the input and expected output. You then execute the test, and evaluate the actual to the expected.
In practice, you may have a good idea how the system works, and so writing typical positive and negative path tests will come naturally. However, for any application of sufficient complexity, it is unreasonable to expect total coverage of every possible scenario.
Unfortunately, this means unexpected scenarios will crop up in Quality Assurance [QA], PreProduction [PP], and Production [Prod] cycles. At which point, your attempts to replicate these scenarios in dev should make their way into your integration and system integration suites as automated tests.
Hope this helps, :)
ps: pet-peeve #1: managers or devs calling integration and system integration tests "unit tests" simply because nUnit or MsTest was used to automate it ...

What you describe is indeed integration testing (more or less). And no, it is not an antipattern, but a necessary part of the sw development lifecycle.
Any reasonably complicated program is more than the sum of its parts. So however well you unit test it, you still have not much clue about whether the whole system is going to work as expected.
There are several aspects of why it is so:
unit tests are performed in an isolated environment, so they can't say anything about how the parts of the program are working together in real life
the "unit tester hat" easily limits one's view, so there are whole classes of factors which the developers simply don't recognize as something that needs to be tested*
even if they do, there are things which can't be reasonably tested in unit tests - e.g. how do you test whether your app server survives under high load, or if the DB connection goes down in the middle of a request?
* One example I just read from Luke Hohmann's book Beyond Software Architecture: in an app which applied strong antipiracy defense by creating and maintaining a "snapshot" of the IDs of HW components in the actual machine, the developers had the code very well covered with unit tests. Then QA managed to crash the app in 10 minutes by trying it out on a machine without a network card. As it turned out, since the developers were working on Macs, they took it for granted that the machine has a network card whose MAC address can be incorporated into the snapshot...

What I would do in order to test the
"real" application: making the http
request (either manually or automated)
with some ids from the database and
then look in the filesystem if the
file was written. Of course this
process could be automated, but still:
doesn´t that duplicate the test-logic?
Maybe you are duplicated code, but you are not duplicating efforts. Unit tests and integrations tests serve two different purposes, and usually both purposes are desired in the SDLC. If possible factor out code used for both unit/integration tests into a common library. I would also try to have separate projects for your unit/integration tests b/c
your unit tests should be ran separately (fast and no dependencies). Your integration tests will be more brittle and break often so you probably will have a different policy for running/maintaining those tests.
Is this what is called an "integration
test"?
Yes indeed it is.

In an integration test, just as in a unit test you need to validate what happened in the test. In your example you specified an OutfileWriter, You would need some mechanism to verify that the file and data is good. You really want to automate this so you might want to have a:
Class OutFilevalidator {
function isCorrect(fName, dataList) {
// open file read data and
// validation logic
}

You might review "Taming the Beast", a presentation by Markus Clermont and John Thomas about automated testing of AJAX applications.
YouTube Video
Very rough summary of a relevant piece: you want to use the smallest testing technique you can for any specific verification. Spelling the same idea another way, you are trying to minimize the time required to run all of the tests, without sacrificing any information.
The larger tests, therefore are mostly about making sure that the plumbing is right - is Tab A actually in slot A, rather than slot B; do both components agree that length is measured in meters, rather than feet, and so on.
There's going to be duplication in which code paths are executed, and possibly you will reuse some of the setup and verification code, but I wouldn't normally expect your integration tests to include the same level of combinatoric explosion that would happen at a unit level.

Driving your TDD with BDD would cover most of this for you. You can use Cucumber / SpecFlow, with WatiR / WatiN. For each feature it has one or more scenarios, and you work on one scenario (behaviour) at a time, and when it passes, you move onto the next scenario until the feature is complete.
To complete a scenario, you have to use TDD to drive the code necessary to make each step in the current scenario pass. The scenarios are agnostic to your back end implementation, however they verify that your implementation works; if there is something that isn't working in the web app for that feature, the behaviour needs to be in a scenario.
You can of course use integration testing, as others pointed out.

Database data needed in integration tests; created by API calls or using imported data?

This question is more or less programming language agnostic. However as I'm mostly into Java these days that's where I'll draw my examples from. I'm also thinking about the OOP case, so if you want to test a method you need an instance of that methods class.
A core rule for unit tests is that they should be autonomous, and that can be achieved by isolating a class from its dependencies. There are several ways to do it and it depends on if you inject your dependencies using IoC (in the Java world we have Spring, EJB3 and other frameworks/platforms which provide injection capabilities) and/or if you mock objects (for Java you have JMock and EasyMock) to separate a class being tested from its dependencies.
If we need to test groups of methods in different classes* and see that they are well integration, we write integration tests. And here is my question!
At least in web applications, state is often persisted to a database. We could use the same tools as for unit tests to achieve independence from the database. But in my humble opinion I think that there are cases when not using a database for integration tests is mocking too much (but feel free to disagree; not using a database at all, ever, is also a valid answer as it makes the question irrelevant).
When you use a database for integration tests, how do you fill that database with data? I can see two approaches:
Store the database contents for the integration test and load it before starting the test. If it's stored as an SQL dump, a database file, XML or something else would be interesting to know.
Create the necessary database structures by API calls. These calls are probably split up into several methods in your test code and each of these methods may fail. It could be seen as your integration test having dependencies on other tests.
How are you making certain that database data needed for tests is there when you need it? And why did you choose the method you choose?
Please provide an answer with a motivation, as it's in the motivation the interesting part lies. Remember that just saying "It's best practice!" isn't a real motivation, it's just re-iterating something you've read or heard from someone. For that case please explain why it's best practice.
*I'm including one method calling other methods in (the same or other) instances of the same class in my definition of unit test, even though it might technically not be correct. Feel free to correct me, but let's keep it as a side issue.

I prefer creating the test data using API calls.
In the beginning of the test, you create an empty database (in-memory or the same that is used in production), run the install script to initialize it, and then create whatever test data used by the database. Creation of the test data may be organized for example with the Object Mother pattern, so that the same data can be reused in many tests, possibly with minor variations.
You want to have the database in a known state before every test, in order to have reproducable tests without side effects. So when a test ends, you should drop the test database or roll back the transaction, so that the next test could recreate the test data always the same way, regardless of whether the previous tests passed or failed.
The reason why I would avoid importing database dumps (or similar), is that it will couple the test data with the database schema. When the database schema changes, you would also need to change or recreate the test data, which may require manual work.
If the test data is specified in code, you will have the power of your IDE's refactoring tools at your hand. When you make a change which affects the database schema, it will probably also affect the API calls, so you will anyways need to refactor the code using the API. With nearly the same effort you can also refactor the creation of the test data - especially if the refactoring can be automated (renames, introducing parameters etc.). But if the tests rely on a database dump, you would need to manually refactor the database dump in addition to refactoring the code which uses the API.
Another thing related to integration testing the database, is testing that upgrading from a previous database schema works right. For that you might want to read the book Refactoring Databases: Evolutionary Database Design or this article: http://martinfowler.com/articles/evodb.html

In integration tests, you need to test with real database, as you have to verify that your application can actually talk to the database. Isolating the database as dependency means that you are postponing the real test of whether your database was deployed properly, your schema is as expected and your app is configured with the right connection string. You don't want to find any problems with these when you deploy to production.
You also want to test with both precreated data sets and empty data set. You need to test both path where your app starts with an empty database with only your default initial data and starts creating and populating the data and also with a well-defined data sets that target specific conditions you want to test, like stress, performance and so on.
Also, make sur that you have the database in a well-known state before each state. You don't want to have dependencies between your integration tests.

Why are these two approaches defined as being exclusively?
I can't see any viable argument for
not using pre-existing data sets, especially particular data that has
caused problems in the past.
I can't
see any viable argument for not
programmatically extending that data with
all the possible conditions that
you can imagine causing problems and even a
bit of random data for integration
testing.
In modern agile approaches, Unit tests are where it really matters that the same tests are run each time. This is because unit tests are aimed not at finding bugs but at preserving the functionality of the app as it is developed, allowing the developer to refactor as needed.
Integration tests, on the other hand, are designed to find the bugs you did not expect. Running with some different data each time can even be good, in my opinion. You just have to make sure your test preserves the failing data if you get a failure. Remember, in formal integration testing, the application itself will be frozen except for bug fixes so your tests can be change to test for the maximum possible number and kinds of bugs. In integration, you can and should throw the kitchen sink at the app.
As others have noted, of course, all this naturally depends on the kind of application that you are developing and the kind of organization you are in, etc.

It sounds like your question is actually two questions. Should you exclude the database from your testing? When you do a database, then how should you generate the data in the database?
When possible I prefer to use an actual database. Frequently the queries (written in SQL, HQL, etc.) in CRUD classes can return surprising results when confronted with an actual database. It's better to flush these issues out early on. Often developers will write very thin unit tests for CRUD; testing only the most benign cases. Using an actual database for your testing can test all kinds corner cases you may not have even been aware of.
That being said there can be other issues. After each test you want to return your database to a known state. It my current job we nuke the database by executing all the DROP statements and then completely recreating all the tables from scratch. This is extremely slow on Oracle, but can be very fast if you use an in memory database like HSQLDB. When we need to flush out Oracle specific issues we just change the database URL and driver properties and then run against Oracle. If you don't have this kind of database portability then Oracle also has some kind of database snapshot feature which can be used specifically for this purpose; rolling back the entire database to some previous state. I'm sure what other databases have.
Depending on what kind of data will be in your database the API or the load approach may work better or worse. When you have highly structured data with many relations, APIs will make your life easier my making the relations between your data explicit. It will be harder for you to make a mistake when creating your test data set. As mentioned by other posters refactoring tools can take care of some of the changes to structure of your data automatically. Often I find it useful to think of API generated test data as composing a scenario; when a user/system has done steps X, Y Z and then tests will go from there. These states can be achieved because you can write a program that calls the same API your user would use.
Loading data becomes much more important when you need large volumes of data, you have few relations between within your data or there is consistency in the data that can not be expressed using APIs or standard relational mechanisms. At one job that at worked at my team was writing the reporting application for a large network packet inspection system. The volume of data was quite large for the time. In order to trigger a useful subset of test cases we really needed test data generated by the sniffers. This way correlations between the information about one protocol would correlate with information about another protocol. It was difficult to capture this in the API.
Most databases have tools to import and export delimited text files of tables. But often you only want subsets of them; making using data dumps more complicated. At my current job we need to take some dumps of actual data which gets generated by Matlab programs and stored in the database. We have tool which can dump a subset of the database data and then compare it with the "ground truth" for testing. It seems our extraction tools are being constantly modified.

I've used DBUnit to take snapshots of records in a database and store them in XML format. Then my unit tests (we called them integration tests when they required a database), can wipe and restore from the XML file at the start of each test.
I'm undecided whether this is worth the effort. One problem is dependencies on other tables. We left static reference tables alone, and built some tools to detect and extract all child tables along with the requested records. I read someone's recommendation to disable all foreign keys in your integration test database. That would make it way easier to prepare the data, but you're no longer checking for any referential integrity problems in your tests.
Another problem is database schema changes. We wrote some tools that would add default values for columns that had been added since the snapshots were taken.
Obviously these tests were way slower than pure unit tests.
When you're trying to test some legacy code where it's very difficult to write unit tests for individual classes, this approach may be worth the effort.

I do both, depending on what I need to test:
I import static test data from SQL scripts or DB dumps. This data is used in object load (deserialization or object mapping) and in SQL query tests (when I want to know whether the code will return the correct result).
Plus, I usually have some backbone data (config, value to name lookup tables, etc). These are also loaded in this step. Note that this loading is a single test (along with creating the DB from scratch).
When I have code which modifies the DB (object -> DB), I usually run it against a living DB (in memory or a test instance somewhere). This is to ensure that the code works; not to create any large amount of rows. After the test, I rollback the transaction (following the rule that tests must not modify the global state).
Of course, there are exceptions to the rule:
I also create large amount of rows in performance tests.
Sometimes, I have to commit the result of a unit test (otherwise, the test would grow too big).

I generally use SQL scripts to fill the data in the scenario you discuss.
It's straight-forward and very easily repeatable.

This will probably not answer all your questions, if any, but I made the decision in one project to do unit testing against the DB. I felt in my case that the DB structure needed testing too, i.e. did my DB design deliver what is needed for the application. Later in the project when I feel the DB structure is stable, I will probably move away from this.
To generate data I decided to create an external application that filled the DB with "random" data, I created a person-name and company-name generators etc.
The reason for doing this in an external program was:
1. I could rerun the tests on by test modified data, i.e. making sure my tests were able to run several times and the data modification made by the tests were valid modifications.
2. I could if needed, clean the DB and get a fresh start.
I agree that there are points of failure in this approach, but in my case since e.g. person generation was part of the business logic generating data for tests was actually testing that part too.

Our team confront the same question recently.
Before, we were using specflow to do integration testing. With specflow, QA can write each test case inside which populating necessary test data to DB.
Now, QA want to use postman to test API, how can they populate the data? One solution is creating Apis for populating them. Another is sync historical data from Prod to test env.
Will update my answer once we try different solutions and decide which one to go.

Automated integration testing a C++ app with a database

I am introducing automated integration testing to a mature application that until now has only been manually tested.
The app is Windows based and talks to a MySQL database.
What is the best way (including details of any tools recommended) to keep tests independent of each other in terms of the database transactions that will occur?
(Modifications to the app source for this particular purpose are not an option.)

How are you verifying the results?
If you need to query the DB (and it sounds like you probably do) for results then I agree with Kris K, except I would endeavor to rebuild the DB after every test case, not just every suite.
This helps avoid dangerous interacting tests
As for tools, I would recommend CppUnit. You aren't really doing unit tests, but it shouldn't matter as the xUnit framework should give you the set up and teardown framework you'll need to automatically set up your test fixture
Obviously this can result in slow-running tests, depending on your database size, population etc. You may be able to attach/detach databases rather than dropping/rebuilding.
If you're interested in further research, check out XUnit Test Patterns. It's a fine book and a good website for this kind of thing.
And thanks for automating :)
Nick

You can dump/restore the database for each test suite, etc. Since you are automating this, it may be something in the setup/teardown functionality.

I used to restore the database in the SetUp function of the database related unit test class. This way it was ensured that each test runs under the same conditions.
You may consider to prepare special database content for the tests, i.e. with less data than the current production version (to keep the restore times reasonable).

The best environment for such testing, I believe, is VMWare or an equivalent. Set up your database, transaction log and so on, then record the whole lot - database as well as configuration. Then to re-test, reload the image and database and kick off the tests. This still requires maintenance of the tests as the system changes, but at least the tests are repeatable, which is one of your greatest challenges in integration testing.
For test automation, many people use Perl, but we've found that Perl programs grow like Topsy and become convoluted. The use of Python as a scripting language (we run C++ tests) is worthwhile if you're trying to build a series of structured tests.

As #Kris K. says dumping and restoring the database between each test will probably be the way to go.
Since you are looking at doing testing external to the App I would look to build the testing framework in a language where you can take advantage of better testing tools.
If you built the testing framework in Java you could take advantage of JUnit and potentially even something like FitNesse.
Don't think that just because the application under test is C++ that means you are stuck using C++ for your automated testing.

Please try AnyDbTest, I think it is the very tool you are finding. (www.anydbtest.com).
Features:
1.Writing test case with Xml, not Java/C++/C#/VB code. Not need those expensive programming tools.
2.Supports all popular databases, such as Oracle/SQL Server/My SQL
3.So many kinds of assertion supported, such as StrictEqual, SetEqual, IsSupersetOf, Overlaps, and RecordCountEqual etc. Plus, most of assertions can prefix logic not operator.
4.Allows using an Excel spreadsheet/Xml as the source of the data for the tests. As you know, Excel spreadsheet is to easily create/edit and maintain the test data.
5.Supports sandbox test model, if one test will be done in sandbox, all database operations on each DB will be rolled back meaning any changes will be undone.
6.Allows performing data pump from one database/Excel into target database in testing initialization and finalization phase. This is easy way to prepare the test data for testing.
7.Unique cross-different-type-database testing, which means target and reference result set can come from two databases, even one is SQL Server, another is Oracle.
8.Set style comparison for recordset. AnyDbTest will tell you what is the intersection, or surplus or absence between the two record sets.
9.Sequential style comparison for recordset or scalar values. It means the two result set will be compared in their original sequence.
10.Allow to export result set of SQL statement into Xml/Excel file.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js