Comparing the schema of two databases for integration testing

Comparing the schema of two databases for integration testing - unit-testing

We use NHibernate generated schema to run unit tests against a database (integration tests I guess they are). I wondered if it was feasible to compare the generated schema against our development database. This would tell us when we had misspelt column names in our mappings or other issues like that. It would also go a long way toward keeping keys and the like consistent across the two.
Is this kind of automated compare feasible? How is the best way to go about doing it?

If you are unable to find a solution using nhibernate, you could look into something like RedGate's SQL Compare tool. This tool makes it incredibly easy to perform comparisons on different databases and see the schema differences. They also have a software development kit that allows you to leverage the power of SQL Compare in your own applications (something I have not yet gotten into, but would love to if the need ever arose).

Related

Automated testing in RPG (or other ILE languages)

we have quite a lot of RPG-programs here, and we do a lot of automated testing, but we are not very good yet in combining those two. Are there good ways to do automated testing on RPG programs -- or on any other ILE programs for that matter?
I am aware of a project named RPGUnit, but that has it's last update in 2007. However, it seems it is still used, since RPG Next Gen is currently putting some work in including it.
What's you experience with those? Is there something else, that I am missing, like some great sofware tool google fails to find?
I'm interested in unit testing as well as integration testing of complete projects. Anything that integrates with tools like jenkins is welcome. If it involves IBM's Rational Developer or System i Navigator, that's okay, too.
We are in an early phase of creating new testing plans for our RPG development process, and I don't want it, to head in the wrong direction right from the start.

You probably already know how broad a subject 'testing' can be. IBM have a product called Rational Function Tester (I haven't used it) http://www-01.ibm.com/software/awdtools/tester/functional/ I myself use RPGUnit. No, it hasn't been updated recently but it still has all the pieces needed for testing subprocedures in the same way one would test Java methods.
Frankly, that's the easy part. The hard part is creating a test database and keeping it current enough to be representative of the production database. Rodin have some database tooling but I haven't the budget for those, so I roll my own more or less by hand. I use many SQL statements in a CL program to extract production data so I can maintain referential integrity. Then I use some more SQL to add my exceptional test cases - those relationships which aren't present in the production data but need to be tested for. Then I make a complete copy of the test database as a reference point. Then I run my test cases, which will update the test database. I've written a home grown CMPPFM utility that will allow me to compare the reference database against the now modified test database. This will show changes, but it still needs a lot of manual labour to review the comparisons to ensure that the proper rows got the proper updates. I haven't gone the extra mile to automate that yet. One big caveat is there are some columns you don't care about, like a change timestamp.

We went with RPGUNIT, and found it a good base to work from, but ended up extending it a lot to tie in with our Change Management system, and the way we work. I've written about the things we tried here: http://www.littlebluemonkey.com/blog/my-rpg-unit-test-journey

Database data needed in integration tests; created by API calls or using imported data?

This question is more or less programming language agnostic. However as I'm mostly into Java these days that's where I'll draw my examples from. I'm also thinking about the OOP case, so if you want to test a method you need an instance of that methods class.
A core rule for unit tests is that they should be autonomous, and that can be achieved by isolating a class from its dependencies. There are several ways to do it and it depends on if you inject your dependencies using IoC (in the Java world we have Spring, EJB3 and other frameworks/platforms which provide injection capabilities) and/or if you mock objects (for Java you have JMock and EasyMock) to separate a class being tested from its dependencies.
If we need to test groups of methods in different classes* and see that they are well integration, we write integration tests. And here is my question!
At least in web applications, state is often persisted to a database. We could use the same tools as for unit tests to achieve independence from the database. But in my humble opinion I think that there are cases when not using a database for integration tests is mocking too much (but feel free to disagree; not using a database at all, ever, is also a valid answer as it makes the question irrelevant).
When you use a database for integration tests, how do you fill that database with data? I can see two approaches:
Store the database contents for the integration test and load it before starting the test. If it's stored as an SQL dump, a database file, XML or something else would be interesting to know.
Create the necessary database structures by API calls. These calls are probably split up into several methods in your test code and each of these methods may fail. It could be seen as your integration test having dependencies on other tests.
How are you making certain that database data needed for tests is there when you need it? And why did you choose the method you choose?
Please provide an answer with a motivation, as it's in the motivation the interesting part lies. Remember that just saying "It's best practice!" isn't a real motivation, it's just re-iterating something you've read or heard from someone. For that case please explain why it's best practice.
*I'm including one method calling other methods in (the same or other) instances of the same class in my definition of unit test, even though it might technically not be correct. Feel free to correct me, but let's keep it as a side issue.

I prefer creating the test data using API calls.
In the beginning of the test, you create an empty database (in-memory or the same that is used in production), run the install script to initialize it, and then create whatever test data used by the database. Creation of the test data may be organized for example with the Object Mother pattern, so that the same data can be reused in many tests, possibly with minor variations.
You want to have the database in a known state before every test, in order to have reproducable tests without side effects. So when a test ends, you should drop the test database or roll back the transaction, so that the next test could recreate the test data always the same way, regardless of whether the previous tests passed or failed.
The reason why I would avoid importing database dumps (or similar), is that it will couple the test data with the database schema. When the database schema changes, you would also need to change or recreate the test data, which may require manual work.
If the test data is specified in code, you will have the power of your IDE's refactoring tools at your hand. When you make a change which affects the database schema, it will probably also affect the API calls, so you will anyways need to refactor the code using the API. With nearly the same effort you can also refactor the creation of the test data - especially if the refactoring can be automated (renames, introducing parameters etc.). But if the tests rely on a database dump, you would need to manually refactor the database dump in addition to refactoring the code which uses the API.
Another thing related to integration testing the database, is testing that upgrading from a previous database schema works right. For that you might want to read the book Refactoring Databases: Evolutionary Database Design or this article: http://martinfowler.com/articles/evodb.html

In integration tests, you need to test with real database, as you have to verify that your application can actually talk to the database. Isolating the database as dependency means that you are postponing the real test of whether your database was deployed properly, your schema is as expected and your app is configured with the right connection string. You don't want to find any problems with these when you deploy to production.
You also want to test with both precreated data sets and empty data set. You need to test both path where your app starts with an empty database with only your default initial data and starts creating and populating the data and also with a well-defined data sets that target specific conditions you want to test, like stress, performance and so on.
Also, make sur that you have the database in a well-known state before each state. You don't want to have dependencies between your integration tests.

Why are these two approaches defined as being exclusively?
I can't see any viable argument for
not using pre-existing data sets, especially particular data that has
caused problems in the past.
I can't
see any viable argument for not
programmatically extending that data with
all the possible conditions that
you can imagine causing problems and even a
bit of random data for integration
testing.
In modern agile approaches, Unit tests are where it really matters that the same tests are run each time. This is because unit tests are aimed not at finding bugs but at preserving the functionality of the app as it is developed, allowing the developer to refactor as needed.
Integration tests, on the other hand, are designed to find the bugs you did not expect. Running with some different data each time can even be good, in my opinion. You just have to make sure your test preserves the failing data if you get a failure. Remember, in formal integration testing, the application itself will be frozen except for bug fixes so your tests can be change to test for the maximum possible number and kinds of bugs. In integration, you can and should throw the kitchen sink at the app.
As others have noted, of course, all this naturally depends on the kind of application that you are developing and the kind of organization you are in, etc.

It sounds like your question is actually two questions. Should you exclude the database from your testing? When you do a database, then how should you generate the data in the database?
When possible I prefer to use an actual database. Frequently the queries (written in SQL, HQL, etc.) in CRUD classes can return surprising results when confronted with an actual database. It's better to flush these issues out early on. Often developers will write very thin unit tests for CRUD; testing only the most benign cases. Using an actual database for your testing can test all kinds corner cases you may not have even been aware of.
That being said there can be other issues. After each test you want to return your database to a known state. It my current job we nuke the database by executing all the DROP statements and then completely recreating all the tables from scratch. This is extremely slow on Oracle, but can be very fast if you use an in memory database like HSQLDB. When we need to flush out Oracle specific issues we just change the database URL and driver properties and then run against Oracle. If you don't have this kind of database portability then Oracle also has some kind of database snapshot feature which can be used specifically for this purpose; rolling back the entire database to some previous state. I'm sure what other databases have.
Depending on what kind of data will be in your database the API or the load approach may work better or worse. When you have highly structured data with many relations, APIs will make your life easier my making the relations between your data explicit. It will be harder for you to make a mistake when creating your test data set. As mentioned by other posters refactoring tools can take care of some of the changes to structure of your data automatically. Often I find it useful to think of API generated test data as composing a scenario; when a user/system has done steps X, Y Z and then tests will go from there. These states can be achieved because you can write a program that calls the same API your user would use.
Loading data becomes much more important when you need large volumes of data, you have few relations between within your data or there is consistency in the data that can not be expressed using APIs or standard relational mechanisms. At one job that at worked at my team was writing the reporting application for a large network packet inspection system. The volume of data was quite large for the time. In order to trigger a useful subset of test cases we really needed test data generated by the sniffers. This way correlations between the information about one protocol would correlate with information about another protocol. It was difficult to capture this in the API.
Most databases have tools to import and export delimited text files of tables. But often you only want subsets of them; making using data dumps more complicated. At my current job we need to take some dumps of actual data which gets generated by Matlab programs and stored in the database. We have tool which can dump a subset of the database data and then compare it with the "ground truth" for testing. It seems our extraction tools are being constantly modified.

I've used DBUnit to take snapshots of records in a database and store them in XML format. Then my unit tests (we called them integration tests when they required a database), can wipe and restore from the XML file at the start of each test.
I'm undecided whether this is worth the effort. One problem is dependencies on other tables. We left static reference tables alone, and built some tools to detect and extract all child tables along with the requested records. I read someone's recommendation to disable all foreign keys in your integration test database. That would make it way easier to prepare the data, but you're no longer checking for any referential integrity problems in your tests.
Another problem is database schema changes. We wrote some tools that would add default values for columns that had been added since the snapshots were taken.
Obviously these tests were way slower than pure unit tests.
When you're trying to test some legacy code where it's very difficult to write unit tests for individual classes, this approach may be worth the effort.

I do both, depending on what I need to test:
I import static test data from SQL scripts or DB dumps. This data is used in object load (deserialization or object mapping) and in SQL query tests (when I want to know whether the code will return the correct result).
Plus, I usually have some backbone data (config, value to name lookup tables, etc). These are also loaded in this step. Note that this loading is a single test (along with creating the DB from scratch).
When I have code which modifies the DB (object -> DB), I usually run it against a living DB (in memory or a test instance somewhere). This is to ensure that the code works; not to create any large amount of rows. After the test, I rollback the transaction (following the rule that tests must not modify the global state).
Of course, there are exceptions to the rule:
I also create large amount of rows in performance tests.
Sometimes, I have to commit the result of a unit test (otherwise, the test would grow too big).

I generally use SQL scripts to fill the data in the scenario you discuss.
It's straight-forward and very easily repeatable.

This will probably not answer all your questions, if any, but I made the decision in one project to do unit testing against the DB. I felt in my case that the DB structure needed testing too, i.e. did my DB design deliver what is needed for the application. Later in the project when I feel the DB structure is stable, I will probably move away from this.
To generate data I decided to create an external application that filled the DB with "random" data, I created a person-name and company-name generators etc.
The reason for doing this in an external program was:
1. I could rerun the tests on by test modified data, i.e. making sure my tests were able to run several times and the data modification made by the tests were valid modifications.
2. I could if needed, clean the DB and get a fresh start.
I agree that there are points of failure in this approach, but in my case since e.g. person generation was part of the business logic generating data for tests was actually testing that part too.

Our team confront the same question recently.
Before, we were using specflow to do integration testing. With specflow, QA can write each test case inside which populating necessary test data to DB.
Now, QA want to use postman to test API, how can they populate the data? One solution is creating Apis for populating them. Another is sync historical data from Prod to test env.
Will update my answer once we try different solutions and decide which one to go.

Query building in a database agnostic way

In a C++ application that can use just about any relational database, what would be the best way of generating queries that can be easily extended to allow for a database engine's eccentricities?
In other words, the code may need to retrieve data in a way that is not consistent among the various database engines. What's the best way to design the code on the client side to generate queries in a way that will make supporting a new database engine a relatively painless affair.
For example, if I have (MFC)code that looks like this:
CString query = "SELECT id FROM table"
results = dbConnection->Query(query);
and we decide to support some database that uses, um, "AVEC" instead of "FROM". Now whenever the user uses that database engine, this query will fail.
Options so far:
Worst option: have the code making the query check the database type.
Better option: Create query request method on the db connection object that takes a unique query "code" and returns the appropriate query based on the database engine in use.
Betterer option: Create a query builder class that allows the caller to construct queries without using any SQL directly. Once the query is completed, caller can invoke a "Generate" method which returns a query string approrpriate for the active database engine
Best option: ??
Note: The database engine itself is abstracted away through some thin layers of our own creation. It's the queries themselves are the only remaining problem.
Solution:
I've decided to go with the "better" option (query "selector") for two reasons.
Debugging: As mentioned below, debugging is going to be slightly easier with the selector approach since the queries are pre-built and listed out in a readable form in code.
Flexibility: It occurred to me that there are some databases which might have vastly better and completely different ways of solving a particular query. For example, with Access I perform a complicated query on multiple tables each time because I have to, but on Sql Server I'd like to setup a view. Selecting from the view and from several tables are completely different queries (i think) and this query selector would handle it easily.

You need your own query-writing object, which can be inherited from by database-specific implementations.
So you would do something like:
DbAgnosticQueryObject query = new PostgresSQLQuery();
query.setFrom('foo');
query.setSelect('id');
// and so on
CString queryString = query.toString();
It can get pretty complicated in there once you go past simple selects from a single table. There are already ORM packages out there that deal with a lot of these nuances; it may be worth at looking at them instead of writing your own.

Best option: Pick a database, and code to it.
How often are you going to up and swap out the database on the back end of a production system? And even if you did, you'd have a lot more to worry about than just minor syntax issues. (Major stuff like join syntax, even datatypes can differ widely between databases.)
Now, if you are designing a commercial application where you want the customer to be able to use one of several back-end options when they implement it, then you may have to specify "we support Oracle, MS SQl, or MYSQL" and code to those specific options.

All of your options can be reduced to
Worst option: have the code making the query check the database type.
It's just a matter of where you're putting the logic to check the database type.
The option that I've seen work best in practice is
Better option: Create query request method on the db connection object that takes a unique query "code" and returns the appropriate query based on the database engine in use.
In my experience it is much easier to test queries independently from the rest of your code. It gets a lot harder if you have objects that are piecing together queries from bits of syntax, because then you have to test the query-creation code and the query itself.
If you pull all of your SQL out into separate files that are written and maintained by hand, you can have someone who is an expert in SQL write them (you can still automate the testing of these queries). If you try to write query-generating functions you'll essentially have a C++ expert writing SQL.

Choose an ORM, and start mapping.
If you are to support more than one DB, your problem is only going to get worse.
And just think of DB that are comming - cloud dbs with no (or close to no) SQL, and Object databases.

Take your queries outside the code - put them in the DB or in a resource file and allow overrides for different database engines.
If you use SPs it's potentially even easier, since the SPs abstract away your database differences.

I would think that what you would want to do, if you needed the ability to support multiple databases, would be to create a data provider interface (or abstract class) and associated concrete implementations. The data provider would need to support your standard query operators and other common, supported functionality required support your query operations (have a look at IEnumerable extension methods in .NET 3.5). Each concrete provider would then translate these into specific queries based on the target database engine.
Essentially, what you do is create a database abstraction layer and have your code interact with it. If you can find one of these for C++, it would probably be worth buying instead of writing. You may also want to look for Inversion of Control (IoC) containers for C++ that would basically do this and more. I know of several for Java and C#, but I'm not familiar with any for C++.

When unit testing, do you have to use a database to test CRUD operations?

When unit testing, is it a must to use a database when testing CRUD operations?
Can sql lite help with this? Do you have to cre-create the db somehow in memory?
I am using mbunit.

No. Integrating an actual DB would be integration testing. Not unit testing.
Yes you could use any in-memory DB like SQLite or MS SQL Compact for this if you can't abstract (mock) your DAL/DAO in any other way.
With this in mind I have to point out, that unit testing is possible all the way to DAL, but not DAL itself. DAL will have to be tested with some sort of an actual DB in integration testing.

As with all complicated question, the answer is: It depends :)
In general you should hide your data access layer behind an interface so that you can test the rest of the application without using a database, but what if you would like to test the data access implementation itself?
In some cases, some people consider this redundant since they mostly use declarative data access technologies such as ORMs.
In other cases, the data access component itself may contain some logic that you may want to test. That can be an entirely relevant thing to do, but you will need the database to do that.
Some people consider this to be Integration Tests instead of Unit Tests, but in my book, it doesn't matter too much what you call it - the most important thing is that you get value out of your fully automated tests, and you can definitely use a unit testing framework to drive those tests.
A while back I wrote about how to do this on SQL Server. The most important thing to keep in mind is to avoid the temptation to create a General Fixture with some 'representative data' and attempt to reuse this across all tests. Instead, you should fill in data as part of each test and clean it up after.

When unit testing, is it a must to use a database when testing CRUD operations?
Assuming for a moment that you have extracted interfaces round said CRUD operations and have tested everything that uses said interface via mocks or stubs. You are now left with a chunk of code that is a save method containing a bit of code to bind objects and some SQL.
If so then I would declare that a "Unit" and say you do need a database, and ideally one that is at least a good representation of your database, lest you be caught out with vender specific SQL.
I'd also make light use of mocks in order to force error conditions, but I would not test the save method itself with just mocks. So while technically this may be an integration test I'd still do it as part of my unit tests.
Edit: Missed 2/3s of your question. Sorry.
Can sql lite help with this?
I have in the past used in memory databases and have been bitten as either the database I used and the live system did something different or they took quite some time to start up. I would recommend that every developer have a developer local database anyway.
Do you have to cre-create the db somehow in memory?
In the database yes. I use DbUnit to splatter data and manually keep the schema up to date with SQL scripts but you could use just SQL scripts. Having a developer local database does add some additional maintenance as you have both the schema and the datasets to keep up to data but personally I find is worth while as you can be sure that database layer is working as expected.

As others already pointed out, what you are trying to achieve isn't unit testing but integration testing.
Having that said, and even if I prefer unit testing in isolation with mocks, there is nothing really wrong with integration testing. So if you think it makes sense in your context, just include integration testing in your testing strategy.
Now, regarding your question, I'd check out DbUnit.NET. I don't know the .NET version of this tool but I can tell you that the Java version is great for tests interacting with a database. In a few words, DbUnit allows you to put the database in a known state before a test is run and to perform assert on the content of tables. Really handy. BTW, I'd recommend reading the Best Practices page, even if you decide to not use this tool.

Really, if you are writing a test that connects to a database, you are doing integration testing, not unit testing.
For unit testing such operations, consider using some typed of mock-database object. For instance, if you have a class that encapsulates your database interaction, extract an interface from it and then create an inheriting class that uses simple in-memory objects instead of actually connecting to the database.

As mentioned above, the key here is to have your test database in a known state before the tests are run. In one real-world example, I have a couple of SQL scripts that are run prior to the tests that recreate a known set of test data. From this, I can test CRUD operations and verify that the new row(s) are inserted/updated/deleted.

I wrote a utility called DBSnapshot to help integration test sqlserver databases.
If your database schema is changing frequently it will be helpful to actually test your code against a real db instance. People use SqlLite for speedy tests (because the database runs in memory), but this isn't helpful when you want to verify that your code works against an actual build of your database.
When testing your database you want to follow a pattern similar to: backup the database, setup the database for a test, exercise the code, verify results, restore database to the starting state.
The above will ensure that you can run each test in isolation. My DBSnapshot utility will simplify your code if your writing it .net. I think its easier to use than DbUnit.NET.

Automated integration testing a C++ app with a database

I am introducing automated integration testing to a mature application that until now has only been manually tested.
The app is Windows based and talks to a MySQL database.
What is the best way (including details of any tools recommended) to keep tests independent of each other in terms of the database transactions that will occur?
(Modifications to the app source for this particular purpose are not an option.)

How are you verifying the results?
If you need to query the DB (and it sounds like you probably do) for results then I agree with Kris K, except I would endeavor to rebuild the DB after every test case, not just every suite.
This helps avoid dangerous interacting tests
As for tools, I would recommend CppUnit. You aren't really doing unit tests, but it shouldn't matter as the xUnit framework should give you the set up and teardown framework you'll need to automatically set up your test fixture
Obviously this can result in slow-running tests, depending on your database size, population etc. You may be able to attach/detach databases rather than dropping/rebuilding.
If you're interested in further research, check out XUnit Test Patterns. It's a fine book and a good website for this kind of thing.
And thanks for automating :)
Nick

You can dump/restore the database for each test suite, etc. Since you are automating this, it may be something in the setup/teardown functionality.

I used to restore the database in the SetUp function of the database related unit test class. This way it was ensured that each test runs under the same conditions.
You may consider to prepare special database content for the tests, i.e. with less data than the current production version (to keep the restore times reasonable).

The best environment for such testing, I believe, is VMWare or an equivalent. Set up your database, transaction log and so on, then record the whole lot - database as well as configuration. Then to re-test, reload the image and database and kick off the tests. This still requires maintenance of the tests as the system changes, but at least the tests are repeatable, which is one of your greatest challenges in integration testing.
For test automation, many people use Perl, but we've found that Perl programs grow like Topsy and become convoluted. The use of Python as a scripting language (we run C++ tests) is worthwhile if you're trying to build a series of structured tests.

As #Kris K. says dumping and restoring the database between each test will probably be the way to go.
Since you are looking at doing testing external to the App I would look to build the testing framework in a language where you can take advantage of better testing tools.
If you built the testing framework in Java you could take advantage of JUnit and potentially even something like FitNesse.
Don't think that just because the application under test is C++ that means you are stuck using C++ for your automated testing.

Please try AnyDbTest, I think it is the very tool you are finding. (www.anydbtest.com).
Features:
1.Writing test case with Xml, not Java/C++/C#/VB code. Not need those expensive programming tools.
2.Supports all popular databases, such as Oracle/SQL Server/My SQL
3.So many kinds of assertion supported, such as StrictEqual, SetEqual, IsSupersetOf, Overlaps, and RecordCountEqual etc. Plus, most of assertions can prefix logic not operator.
4.Allows using an Excel spreadsheet/Xml as the source of the data for the tests. As you know, Excel spreadsheet is to easily create/edit and maintain the test data.
5.Supports sandbox test model, if one test will be done in sandbox, all database operations on each DB will be rolled back meaning any changes will be undone.
6.Allows performing data pump from one database/Excel into target database in testing initialization and finalization phase. This is easy way to prepare the test data for testing.
7.Unique cross-different-type-database testing, which means target and reference result set can come from two databases, even one is SQL Server, another is Oracle.
8.Set style comparison for recordset. AnyDbTest will tell you what is the intersection, or surplus or absence between the two record sets.
9.Sequential style comparison for recordset or scalar values. It means the two result set will be compared in their original sequence.
10.Allow to export result set of SQL statement into Xml/Excel file.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js