object cache for java unit testing - unit-testing

I am not sure what the correct technical term for the test scenario I am thinking about; but here are the features that I want to be able to do during mu unit testing:
Instead of going to database I want a frame work that will store (serialized to disk) objects that I can pass to my unit test methods.
I should be able to create these objects from a DB source and save them to use later in my UNIT test cases.
The object store should be portable (like hsqldb file based DB that I can move around from system to system).
Is there a technical term for framework/library for such requirement? Object Database/Object store etc? Please note that I am not trying to install/configure an entire database rather I want to be able to re-create an already created complex Object structure and pass it onto a junit test method.

Related

Suggestion on Creating Unit test for database layer

thank you for reading my question.
I was just wondering about how shall i create unit tests for existing database layer. as of now my project has existing unit tests but no unit test is written for database layer or any function which inserts / updates / deletes data from database.
We are using Microsoft tests. One approach I think here is
1) We shall create database on the fly i.e. mdf file and we will keep our defaults values ready in it and in our setup method(Nunit) or initialize method(MS tests) we will mock the objects and dump the dummy data into tables.
Also we are not using any mocking framework. So i am all confuse.
i need to know how can we do this from the scratch. Also is there anything optional available for mocking framework.
Any pointers or samples would be highly appreciated.
Thank you again.
A C# unit test shall not touch the database, you should mock the database. It should be possible to execute many thousands of unit test on your local machine (without external (internet, databases, other application)) within seconds (and you want to run them when you build your code).
That leaves us kind of with your question unanswered: what should your database layer tests do? It depends on what kind of logic you have in that assembly! If you have "business or decision" logic should should test that, if you have mapping logic test that. If all your database layer does if using (whatever db framework) to put the load on you database then you might not have anything worth testing there.
If you want to test logic performed by your database (say SP's) you should do that in the database project, and most likely not using mstest.
Of course you can use mstest to setup and tear down database and perform test, but those test will not be unit tests.

Patterns for loading default data when unit testing

Looking for some strategies for how you guys are loading default data when doing unit tests.
I use a builder that contains the default values, just like this: http://elegantcode.com/2008/04/26/test-data-builders-refined/. Then the test only specifies the value it cares for:
Customer customer = new CustomerBuilder()
.WithFirstName("this test only cares about a special ' ... first name value");
After reading the other answers, I want to clear that its not for database data. Its to build the instances/data you pass to the classes you are unit testing.
Its a matter of convenience/keeping the tests simple, plenty of times you are testing very specific behavior that depends on 1-3 fields and you don't care about the rest of the fields.
For unit testing I generally don't load data in advance - each test is designed to work against a data source that may or may not already contain existing records, and so each test writes all any records that are needed to complete the test.
When choosing values to submit to the database I use GUIDs (or other random values) whenever possible as it guarantees that values in the database are unique (e.g. if you create someone named "Mr X Y", it is helpful to know that searching for "X" should return only 1 result, and that there is no chance you have chanced on someone else in the database whose last name happens to be Y)
Often when unit testing I'm testing methods that modify data alongside methods that read data, and so my unit tests use the same API (the one being tested) to write to the database. (It's nice if each unit test covers a specific area of functionality, but it's not absolutely necessary)
If the API being tested doesn't have methods to write to the database however, I write my own set of helper functions - the exact structure is going to depend on the data source, but as an example this is where I often use LINQ to SQL.
TDD is about testing a piece of code in isolation. One create an instance of a class with its dependencies (or mocks of them), call the method under test and assert to verify the outcome of the test.
Usually with TDD one starts with a simple test, without data. When data are needed, they are created in the test fixture (the isolated environment where the test is executed) by the test setUp() method and then destroyed by the tearDown() method after the test has been run. Data are not loaded from the database.
Preferred strategy is in-transaction data. Spring offers extensive support (for both JUnit 3 and 4). With this strategy your test begins brand new transaction each time and your data is rolled back at the end of test.
Of course sometimes it's not enough: either data set is too extensive and shared across tests, or multiple transactions are part of the test scope. In that case, I recommend creating shared test data bed that is created before running test suite. There are frameworks for this (dbUnit) but you can also do without them if careful and consistent.
UPD: creating in-transaction data doesn't mean you not need test data, you are likely to end up creating re-usable and shared helper classes to maintain test data in all cases.
I typically have methods like GetCustomer() that return a generic customer. If I need to make the returned customer suite my needs for a particular test, I will simply change the property after it gets returned.
Other times I may pass some configuration information into my GetCustomer() method. For example GetCustomer(string customerType).
I've read expert's opinions that says that each test should contain its own unique data to work with and not try to make the data generic. Even though this may make each test "larger" in size, over all it will make the test clearer because the setup is specific to each test and the goals of each test. I like this advice because I've ran into many cases where trying to make the setup data generic, made things very sloppy very quick.

Database data needed in integration tests; created by API calls or using imported data?

This question is more or less programming language agnostic. However as I'm mostly into Java these days that's where I'll draw my examples from. I'm also thinking about the OOP case, so if you want to test a method you need an instance of that methods class.
A core rule for unit tests is that they should be autonomous, and that can be achieved by isolating a class from its dependencies. There are several ways to do it and it depends on if you inject your dependencies using IoC (in the Java world we have Spring, EJB3 and other frameworks/platforms which provide injection capabilities) and/or if you mock objects (for Java you have JMock and EasyMock) to separate a class being tested from its dependencies.
If we need to test groups of methods in different classes* and see that they are well integration, we write integration tests. And here is my question!
At least in web applications, state is often persisted to a database. We could use the same tools as for unit tests to achieve independence from the database. But in my humble opinion I think that there are cases when not using a database for integration tests is mocking too much (but feel free to disagree; not using a database at all, ever, is also a valid answer as it makes the question irrelevant).
When you use a database for integration tests, how do you fill that database with data? I can see two approaches:
Store the database contents for the integration test and load it before starting the test. If it's stored as an SQL dump, a database file, XML or something else would be interesting to know.
Create the necessary database structures by API calls. These calls are probably split up into several methods in your test code and each of these methods may fail. It could be seen as your integration test having dependencies on other tests.
How are you making certain that database data needed for tests is there when you need it? And why did you choose the method you choose?
Please provide an answer with a motivation, as it's in the motivation the interesting part lies. Remember that just saying "It's best practice!" isn't a real motivation, it's just re-iterating something you've read or heard from someone. For that case please explain why it's best practice.
*I'm including one method calling other methods in (the same or other) instances of the same class in my definition of unit test, even though it might technically not be correct. Feel free to correct me, but let's keep it as a side issue.
I prefer creating the test data using API calls.
In the beginning of the test, you create an empty database (in-memory or the same that is used in production), run the install script to initialize it, and then create whatever test data used by the database. Creation of the test data may be organized for example with the Object Mother pattern, so that the same data can be reused in many tests, possibly with minor variations.
You want to have the database in a known state before every test, in order to have reproducable tests without side effects. So when a test ends, you should drop the test database or roll back the transaction, so that the next test could recreate the test data always the same way, regardless of whether the previous tests passed or failed.
The reason why I would avoid importing database dumps (or similar), is that it will couple the test data with the database schema. When the database schema changes, you would also need to change or recreate the test data, which may require manual work.
If the test data is specified in code, you will have the power of your IDE's refactoring tools at your hand. When you make a change which affects the database schema, it will probably also affect the API calls, so you will anyways need to refactor the code using the API. With nearly the same effort you can also refactor the creation of the test data - especially if the refactoring can be automated (renames, introducing parameters etc.). But if the tests rely on a database dump, you would need to manually refactor the database dump in addition to refactoring the code which uses the API.
Another thing related to integration testing the database, is testing that upgrading from a previous database schema works right. For that you might want to read the book Refactoring Databases: Evolutionary Database Design or this article: http://martinfowler.com/articles/evodb.html
In integration tests, you need to test with real database, as you have to verify that your application can actually talk to the database. Isolating the database as dependency means that you are postponing the real test of whether your database was deployed properly, your schema is as expected and your app is configured with the right connection string. You don't want to find any problems with these when you deploy to production.
You also want to test with both precreated data sets and empty data set. You need to test both path where your app starts with an empty database with only your default initial data and starts creating and populating the data and also with a well-defined data sets that target specific conditions you want to test, like stress, performance and so on.
Also, make sur that you have the database in a well-known state before each state. You don't want to have dependencies between your integration tests.
Why are these two approaches defined as being exclusively?
I can't see any viable argument for
not using pre-existing data sets, especially particular data that has
caused problems in the past.
I can't
see any viable argument for not
programmatically extending that data with
all the possible conditions that
you can imagine causing problems and even a
bit of random data for integration
testing.
In modern agile approaches, Unit tests are where it really matters that the same tests are run each time. This is because unit tests are aimed not at finding bugs but at preserving the functionality of the app as it is developed, allowing the developer to refactor as needed.
Integration tests, on the other hand, are designed to find the bugs you did not expect. Running with some different data each time can even be good, in my opinion. You just have to make sure your test preserves the failing data if you get a failure. Remember, in formal integration testing, the application itself will be frozen except for bug fixes so your tests can be change to test for the maximum possible number and kinds of bugs. In integration, you can and should throw the kitchen sink at the app.
As others have noted, of course, all this naturally depends on the kind of application that you are developing and the kind of organization you are in, etc.
It sounds like your question is actually two questions. Should you exclude the database from your testing? When you do a database, then how should you generate the data in the database?
When possible I prefer to use an actual database. Frequently the queries (written in SQL, HQL, etc.) in CRUD classes can return surprising results when confronted with an actual database. It's better to flush these issues out early on. Often developers will write very thin unit tests for CRUD; testing only the most benign cases. Using an actual database for your testing can test all kinds corner cases you may not have even been aware of.
That being said there can be other issues. After each test you want to return your database to a known state. It my current job we nuke the database by executing all the DROP statements and then completely recreating all the tables from scratch. This is extremely slow on Oracle, but can be very fast if you use an in memory database like HSQLDB. When we need to flush out Oracle specific issues we just change the database URL and driver properties and then run against Oracle. If you don't have this kind of database portability then Oracle also has some kind of database snapshot feature which can be used specifically for this purpose; rolling back the entire database to some previous state. I'm sure what other databases have.
Depending on what kind of data will be in your database the API or the load approach may work better or worse. When you have highly structured data with many relations, APIs will make your life easier my making the relations between your data explicit. It will be harder for you to make a mistake when creating your test data set. As mentioned by other posters refactoring tools can take care of some of the changes to structure of your data automatically. Often I find it useful to think of API generated test data as composing a scenario; when a user/system has done steps X, Y Z and then tests will go from there. These states can be achieved because you can write a program that calls the same API your user would use.
Loading data becomes much more important when you need large volumes of data, you have few relations between within your data or there is consistency in the data that can not be expressed using APIs or standard relational mechanisms. At one job that at worked at my team was writing the reporting application for a large network packet inspection system. The volume of data was quite large for the time. In order to trigger a useful subset of test cases we really needed test data generated by the sniffers. This way correlations between the information about one protocol would correlate with information about another protocol. It was difficult to capture this in the API.
Most databases have tools to import and export delimited text files of tables. But often you only want subsets of them; making using data dumps more complicated. At my current job we need to take some dumps of actual data which gets generated by Matlab programs and stored in the database. We have tool which can dump a subset of the database data and then compare it with the "ground truth" for testing. It seems our extraction tools are being constantly modified.
I've used DBUnit to take snapshots of records in a database and store them in XML format. Then my unit tests (we called them integration tests when they required a database), can wipe and restore from the XML file at the start of each test.
I'm undecided whether this is worth the effort. One problem is dependencies on other tables. We left static reference tables alone, and built some tools to detect and extract all child tables along with the requested records. I read someone's recommendation to disable all foreign keys in your integration test database. That would make it way easier to prepare the data, but you're no longer checking for any referential integrity problems in your tests.
Another problem is database schema changes. We wrote some tools that would add default values for columns that had been added since the snapshots were taken.
Obviously these tests were way slower than pure unit tests.
When you're trying to test some legacy code where it's very difficult to write unit tests for individual classes, this approach may be worth the effort.
I do both, depending on what I need to test:
I import static test data from SQL scripts or DB dumps. This data is used in object load (deserialization or object mapping) and in SQL query tests (when I want to know whether the code will return the correct result).
Plus, I usually have some backbone data (config, value to name lookup tables, etc). These are also loaded in this step. Note that this loading is a single test (along with creating the DB from scratch).
When I have code which modifies the DB (object -> DB), I usually run it against a living DB (in memory or a test instance somewhere). This is to ensure that the code works; not to create any large amount of rows. After the test, I rollback the transaction (following the rule that tests must not modify the global state).
Of course, there are exceptions to the rule:
I also create large amount of rows in performance tests.
Sometimes, I have to commit the result of a unit test (otherwise, the test would grow too big).
I generally use SQL scripts to fill the data in the scenario you discuss.
It's straight-forward and very easily repeatable.
This will probably not answer all your questions, if any, but I made the decision in one project to do unit testing against the DB. I felt in my case that the DB structure needed testing too, i.e. did my DB design deliver what is needed for the application. Later in the project when I feel the DB structure is stable, I will probably move away from this.
To generate data I decided to create an external application that filled the DB with "random" data, I created a person-name and company-name generators etc.
The reason for doing this in an external program was:
1. I could rerun the tests on by test modified data, i.e. making sure my tests were able to run several times and the data modification made by the tests were valid modifications.
2. I could if needed, clean the DB and get a fresh start.
I agree that there are points of failure in this approach, but in my case since e.g. person generation was part of the business logic generating data for tests was actually testing that part too.
Our team confront the same question recently.
Before, we were using specflow to do integration testing. With specflow, QA can write each test case inside which populating necessary test data to DB.
Now, QA want to use postman to test API, how can they populate the data? One solution is creating Apis for populating them. Another is sync historical data from Prod to test env.
Will update my answer once we try different solutions and decide which one to go.

Strategies for Using Mock Objects when Unit Testing DAOs

I am curious what strategies folks have found for unit testing a data access class that does not involve loading (and presumably unloading) a real database for each test method? Are you using mock objects to represent the database connection? If so, are you required to pass the mock object into every method-under-test, and thus forcing the API to require a real db connection as a parameter to every method? Or, are you passing a mock object into the constructor at setup()?
I have a class that is implementing what I believe is a Data Mapper (or maybe gateway) pattern. It is the class responsible for encapsulating SQL and returning (or saving) "business objects". The rest of the code can interact with this mapper layer and the business objects, with total disregard for the persistence model. This code needs to have/maintain, or just know about, a live db connection in the real system. Emulating this under test is tricky.
The problem is how to unit test one of these mapper classes. The practice for creating a unit test under xUnit that I have seen most often is using the setup() method of the test to instantiate the SUT (system under test), usually your object that you're testing, and store it in a local variable in the test class. Then each of your test methods, interact with a unique instance of that SUT.
The assumption though is that whatever you're doing in the setup() method will presumably be replicated somewhere in your real code. So, you have to think about the setup process as "is this something I will want to repeatedly reproduce every time I need to use this object in the real world." If I am passing a db connection into the mapper's constructor in the setup that's fine, but doesn't that mean I'll have to pass a live db connection into the mapper object's constructor every time I want to really use one? Imagine that you'll have all kinds of places where you need to retrieve or store a business object and that to use a data mapper object, you need to pass in the db connection every time?
In my case, I am trying to establish tests for these data mapper objects that achieve the following:
Do not require the database connection object to be instantiated and passed into every method of the mapper class.
Do not require that the test case either connect to a real db or create a real, but "test", db on the fly for each test method.
I have basically seen two suggestions, pass the connection object as a parameter (which I have already addressed) or extend the SUT class just for the test and override whatever db connection setup process you have in the real world to use a mock system instead.
I am curious if anyone else is facing these issues, with any language, and what you have done to solve them? Maybe there is something obvious that I am missing?
In my experience, the responsibility for connection to a database is a sore point in data access. I solved this by letting the DAO take care of that based on the configuration file (app.config, etc). This way I don't need to worry about that when I write my tests. The DAL keeps one or more database connection profiles and connects/disconnects on every data access because in the end the connection pool will take care of physically connecting/disconnecting.
Another thing that helped me was using dbUnit to load baseline data before running the tests. I found it easier to go straight to the database instead of using mock objects. Also by connecting to a real database I can (to a certain point) test concurrency by issuing commands in different threads - mock objects wouldn't give me the real behavior.
You can use DbUnit to test SQL
It depends on what you're really trying to test. If you want to test that your SQL does what you expect, that's really heading into Integration Test territory. Assuming you're using Java, there are several pure-java RDBMS solutions (Apache Derby, HSQLDB, H2) you can use for that.
If on the other hand you're really just testing your Java <-> JDBC code (i.e. reading from ResultSets), then you can mock out pretty much all the relevant parts of JDBC since they're mostly interfaces. JMock is great for this. Simply add a setConnection() method to your Class Under Test, and pass in the mocked java.sql.Connection that will do your bidding. This works really well for keeping tests short and sweet.
Depending on how complex is your database setup, it might be a great option using an in memory store.
Normally I do my unit testing with a in-memory SQLite session. This is full blown database 100% in memory, no files, no config needed. Just one line.
Now this is not always an option. SQLite does not support all sql features of full blown server databases. Normally I use a layer trying to make my code database independent. In those cases I just switch to a in-memory database instance which I quickly create/destroy in memory during every setUp/tearDown.
Are you using any mid-layer to access your database? In most cases the greatest benefit of using that type of middleware is not database portability, but a simplified test harness.

Which object should I mock?

I am writing a repository. Fetching objects is done through a DAO. Creating and updating objects is done through a Request object, which is given to a RequestHandler object (a la Command pattern). I didn't write the DAO, Request, or RequestHandler, so I can't modify them.
I'm trying to write a test for this repository. I have mocked out both the DAO and RequestHandler. My goal is to have the mocked RequestHandler simply add the new or updated object to the mocked DAO. This will create the illusion that I'm talking to the DB. This way, I don't have to mock the repository for all the classes that call this repository.
The problem is that the Request object is this gob of string blobs and various alphanumeric codes. I'm pretty sure XML is involved too. It's sort of a mess. Another developer is writing the code to create the Request object based on the objects being stored. And since RequestHandler takes in Requests and not the object I'm storing, it can't update the mocked DAO.
So the question is: do I mock the Request too, or should I wait until the other guy, who is kind of slow, to finish his code before I write the test? Or screw it and mock out the entire repository when testing the classes that call the repository?
BTW, I say "mock" not in the NMock sense, but rather like faking the DB with an in-memory collection.
To test the repository I would suggest that you use test doubles for all of the lower layer objects.
To test the classes that depend on the repository I would suggest that you use test doubles for the repository.
In both cases I mean test doubles created by some mocking library (fakes where that works for the test, stubs where you need to return something to the object under test and mocks if you really have to).
If you are creating an implementation of the DAO using in-memory collections to functionally replace the database in a demo or test system that is different to unit testing the upper layers. I have done something similar so that I can give prototypes to people and concentrate on business objects not the physical model. That isn't for unit testing though.
You may of may not be creating a web application, but you can have a look at the NerdDinner application which uses Repository. It is a free PDF that explains how to create an application using ASP.NET MVC and can be found here: Professional ASP.NET MVC 2.0