How to automate relationship testing in Doctrine2 entities and mappings? - doctrine-orm

My team and i have a large web application we're porting to use the Symfony framework which includes Doctrine2 for data access. As we're just starting, we're wondering about regression testing for the entity relationships later on.
We have a large number of Doctrine entities that we have automagically generated from the database schema and are now in the process of checking and tidying up each one specifically in the area of relationship mapping. Once the mapping is done (using doc block annotations) and we confirm via simple scripts that the entities are performing correct data access, what is the best way to ensure the mapping and therefore the functionality of the entities don't break in future.
We are writing unit tests for all the entities as standalone units, mocking the required dependencies but this doesn't necessarily protect us from an annotation comment being edited or the schema changing and breaking the current entity relationships.
Any thoughts? Anybody here had the same issues?

We have created what we term functional tests for every top level entity to test the relationships of child entities to that one. Basically a top level entity is instantiated and mock data is used to populate it then all its entity dependencies are added themselves being populated with mock data. This is all then persisted to the database in one operation. The entity is then read back from the database and the tested for equality against the original data. It's a bit more complicated than that (especially the equality test) but if any of the relationship mappings are ever changed or bugged this test immediately fails and give us a que to investigate.

Related

How to unit test entities with DatabaseGenerated fields?

I've inherited an application where every entity has a DateCreated field with the DatabaseGenerated attribute using DateTime.Now on inserts. While beneficial, this introduces problems when unit testing.
I'm a believer in not depending on a volatile resource like a real database during unit testing so my preference would be to use an in-memory solution but neither localdb or Sqlite support the attribute without code changes. EF Core in Action seems to confirm that it's not possible with in-memory databases in general.
But even if I'm willing to look past an in-memory database, the business needs require a diverse set of created dates to be tested and if I want to start fresh every test cycle (and I do) I can only generate data for DateTime.Now.
While I can write some code to operate on real db data that is dumped to application memory where I alter the created date as needed, that's another thing to maintain and support.
Is there another way I'm missing?
Well, I didn't come with the better solution to mock DateTime. But this satisfied my needs:
var triggerTimeStamp = user.GetType().GetProperty(nameof(User.DateCreated));
triggerTimeStamp.SetValue(user, DateTime.UtcNow - TimeSpan.FromHours(17));
I add this to my unit tests where I need to check some specific date time related logic.

Does anybody have experience of using SQLite to write integration tests?

We're using MVC, Entity Framework 4.1 Code First, SQL Server in our project.
Please share your experience: how do you unit test your data service layer? By data service layer I mean services supposed to be run by MVC controllers that have some kind of DbContext derived class declaration inside, so that they depend on this EF DbContext, and encapsulate some business\data logic to fetch and store the data.
After reading few articles and posts, I incline to use separate database to build unit/integration tests on, and I prefer to use in-memory (like SQLite) rather than SQL Compact. However I'm not even sure if it's possible, if you have such an experience, could be please share few lines of code to show how you achieve this.
Unit testing means testing unit = no database, no external dependency, just testing single testable unit. Once you want to involve database you don't unit test any more - you are doing integration testing.
I wrote multiple answers about unit testing / integration testing of code dependent on EF. The last one is here. So if your service layer creates linq queries on context you cannot reliably unit test them. You need integration tests.
I would use the same database as you expect to use in your real code. Why? Because mapping and behaviour between database provides can differ as well as implementation of LINQ. Also in case of SQL server you can use special EF features which don't have to be available in SQLite. Another reason is that last time I checked it, SQLite's provider didn't support database deletion, recreation, etc. which is something people usually want to use for integration tests. Solution for that can be Devart provider.
I don't use a separate database at all. In fact, my Unit Tests don't use a database at all.
My strategy is to create IEnityRepository interfaces for the DB Entities (replace Entity with the actual name). I then pass those to the constructor for my controllers.
During Unit Testing, I simply use a Mocking library to pass mock implementations of the repositories that I need and have the return some set of known data that I can use in the Unit Tests.

EntityTypeConfiguration - What is a clean method to test mapping against the database?

Background:
My company's current structure is using Plinqo/Linq to Sql to create "data access objects", and then use a custom set of CodeSmith templates to build "business objects". To make a very long story short, these two sets of objects are very tightly coupled and, with Linq to SQL, lead to pretty ugly workarounds.
The Plinqo templates do a direct 1:1 mapping of table to class after generating the dbml. This leads to some comfort in that if the database changes, there is a compile-time error on the business object side (or application side).
I am slowly trying to prove out the benefits of EF 4.1 (Code First) to map to the existing schema, but this "type safety" of code generation has come up as a big issue in a key stakeholder's mind.
Problem:
So in entity framework 4.1, I am using code first to map to the existing database.
Poco domain objects
EntityTypeConfiguration for each mapping
What would you suggest as a test project for ensuring that the mapping to the schema is sound? Should I just create a unit test project and do retrievals of each object or is there a better way?
Thanks!
I used one base generic integration test performing CRUD operations and derived tests only contained methods for creating entity and validating results of each operation. Each test method run in transaction scope which didn't commit so the test database was still in initial state.
This can be further improved in scenarios where you start to use repositories and instead of working with single entity types you will start to work with aggregate roots. In such case creating correct integration tests manipulating aggregation roots is very handy.

NHibernate ISession.Replicate with SQLite and native id generation

We are mapping the primary key of an object like this:
Id(x => x.Id, "ID").GeneratedBy.Native("SEQUENCENAME");
We have business logic depending on certain ids to exist (legacy, not easily changed). New objects should get generated ids from an Oracle sequence, but there are always rows with known ids.
We're using SQLite for unit testing and I need to persist new objects to the in-memory database with these known ids. This will not work with any of the following methods:
session.Replicate(objectWithKnownId, <any replication mode>);
session.Merge(objectWithKnownId)
According to nHibernate documentation, the Replicate method seems to be what I'm looking for.
Persist all reachable transient
objects, reusing the current
identifier values.
When using it with SQLite, however, I will only get generated ids. Can anyone think of a good way of solving this?
I typically run any database tests against the database that I'm running the app against - SQLite can be good for quick tests but it is just missing too many of the features that you'll find in a full blown DBMS. You may be able to use a method like the one discussed here to tweak your mappings at runtime if it is a mapping issue.
You could also preload a SQLite database with the entities you need, and copy this in for reuse every time you run the test. This is probably the route I would take for something like this, but I can't offer any technical details on how to do it.
To be honest it sounds a bit strange to have your business logic depend on certain Id's - I would think you'd want it to depend on certain entities - you could then insert these entities and store their generated Id's for the duration of your tests.
After looking into this problem and reading the respons from AlexCuse (+1 to his answer), I deemed it was not possible to use the native id generator in this case. I both needed unit tests to work when saving rows with known ids in test setups and tests inserting with autogenerated ids.
One option was to have some sort of check in the fluent mapping that would use GeneratedBy.Native("SEQUENCENAME") in production code and GeneratedBy.Assigned in tests, but I didn't like the idea of having differences related to NHibernate mappings between unit tests and production.
What I opted for in the end was to handle this in the repository. I have an Add method in the relevant repository and this will handle assigning a generated id from a sequence if the id isn't already set, something like this:
public void Add(TheClass newObject) {
if (newObject.Id == 0) {
newObject.Id = sequenceGenerator.GetNextValue("SEQUENCENAME");
}
session.Save(newObject);
}
In unit tests I will insert a mock sequence generator in the repository. You could argue that this is similar to the approach of having different mappings for unit tests and production code, but I think this approach makes the difference a bit more isolated. The most important reason, though, is that it allows me to use both assigned and automatically generated ids also in unit tests.

Database data needed in integration tests; created by API calls or using imported data?

This question is more or less programming language agnostic. However as I'm mostly into Java these days that's where I'll draw my examples from. I'm also thinking about the OOP case, so if you want to test a method you need an instance of that methods class.
A core rule for unit tests is that they should be autonomous, and that can be achieved by isolating a class from its dependencies. There are several ways to do it and it depends on if you inject your dependencies using IoC (in the Java world we have Spring, EJB3 and other frameworks/platforms which provide injection capabilities) and/or if you mock objects (for Java you have JMock and EasyMock) to separate a class being tested from its dependencies.
If we need to test groups of methods in different classes* and see that they are well integration, we write integration tests. And here is my question!
At least in web applications, state is often persisted to a database. We could use the same tools as for unit tests to achieve independence from the database. But in my humble opinion I think that there are cases when not using a database for integration tests is mocking too much (but feel free to disagree; not using a database at all, ever, is also a valid answer as it makes the question irrelevant).
When you use a database for integration tests, how do you fill that database with data? I can see two approaches:
Store the database contents for the integration test and load it before starting the test. If it's stored as an SQL dump, a database file, XML or something else would be interesting to know.
Create the necessary database structures by API calls. These calls are probably split up into several methods in your test code and each of these methods may fail. It could be seen as your integration test having dependencies on other tests.
How are you making certain that database data needed for tests is there when you need it? And why did you choose the method you choose?
Please provide an answer with a motivation, as it's in the motivation the interesting part lies. Remember that just saying "It's best practice!" isn't a real motivation, it's just re-iterating something you've read or heard from someone. For that case please explain why it's best practice.
*I'm including one method calling other methods in (the same or other) instances of the same class in my definition of unit test, even though it might technically not be correct. Feel free to correct me, but let's keep it as a side issue.
I prefer creating the test data using API calls.
In the beginning of the test, you create an empty database (in-memory or the same that is used in production), run the install script to initialize it, and then create whatever test data used by the database. Creation of the test data may be organized for example with the Object Mother pattern, so that the same data can be reused in many tests, possibly with minor variations.
You want to have the database in a known state before every test, in order to have reproducable tests without side effects. So when a test ends, you should drop the test database or roll back the transaction, so that the next test could recreate the test data always the same way, regardless of whether the previous tests passed or failed.
The reason why I would avoid importing database dumps (or similar), is that it will couple the test data with the database schema. When the database schema changes, you would also need to change or recreate the test data, which may require manual work.
If the test data is specified in code, you will have the power of your IDE's refactoring tools at your hand. When you make a change which affects the database schema, it will probably also affect the API calls, so you will anyways need to refactor the code using the API. With nearly the same effort you can also refactor the creation of the test data - especially if the refactoring can be automated (renames, introducing parameters etc.). But if the tests rely on a database dump, you would need to manually refactor the database dump in addition to refactoring the code which uses the API.
Another thing related to integration testing the database, is testing that upgrading from a previous database schema works right. For that you might want to read the book Refactoring Databases: Evolutionary Database Design or this article: http://martinfowler.com/articles/evodb.html
In integration tests, you need to test with real database, as you have to verify that your application can actually talk to the database. Isolating the database as dependency means that you are postponing the real test of whether your database was deployed properly, your schema is as expected and your app is configured with the right connection string. You don't want to find any problems with these when you deploy to production.
You also want to test with both precreated data sets and empty data set. You need to test both path where your app starts with an empty database with only your default initial data and starts creating and populating the data and also with a well-defined data sets that target specific conditions you want to test, like stress, performance and so on.
Also, make sur that you have the database in a well-known state before each state. You don't want to have dependencies between your integration tests.
Why are these two approaches defined as being exclusively?
I can't see any viable argument for
not using pre-existing data sets, especially particular data that has
caused problems in the past.
I can't
see any viable argument for not
programmatically extending that data with
all the possible conditions that
you can imagine causing problems and even a
bit of random data for integration
testing.
In modern agile approaches, Unit tests are where it really matters that the same tests are run each time. This is because unit tests are aimed not at finding bugs but at preserving the functionality of the app as it is developed, allowing the developer to refactor as needed.
Integration tests, on the other hand, are designed to find the bugs you did not expect. Running with some different data each time can even be good, in my opinion. You just have to make sure your test preserves the failing data if you get a failure. Remember, in formal integration testing, the application itself will be frozen except for bug fixes so your tests can be change to test for the maximum possible number and kinds of bugs. In integration, you can and should throw the kitchen sink at the app.
As others have noted, of course, all this naturally depends on the kind of application that you are developing and the kind of organization you are in, etc.
It sounds like your question is actually two questions. Should you exclude the database from your testing? When you do a database, then how should you generate the data in the database?
When possible I prefer to use an actual database. Frequently the queries (written in SQL, HQL, etc.) in CRUD classes can return surprising results when confronted with an actual database. It's better to flush these issues out early on. Often developers will write very thin unit tests for CRUD; testing only the most benign cases. Using an actual database for your testing can test all kinds corner cases you may not have even been aware of.
That being said there can be other issues. After each test you want to return your database to a known state. It my current job we nuke the database by executing all the DROP statements and then completely recreating all the tables from scratch. This is extremely slow on Oracle, but can be very fast if you use an in memory database like HSQLDB. When we need to flush out Oracle specific issues we just change the database URL and driver properties and then run against Oracle. If you don't have this kind of database portability then Oracle also has some kind of database snapshot feature which can be used specifically for this purpose; rolling back the entire database to some previous state. I'm sure what other databases have.
Depending on what kind of data will be in your database the API or the load approach may work better or worse. When you have highly structured data with many relations, APIs will make your life easier my making the relations between your data explicit. It will be harder for you to make a mistake when creating your test data set. As mentioned by other posters refactoring tools can take care of some of the changes to structure of your data automatically. Often I find it useful to think of API generated test data as composing a scenario; when a user/system has done steps X, Y Z and then tests will go from there. These states can be achieved because you can write a program that calls the same API your user would use.
Loading data becomes much more important when you need large volumes of data, you have few relations between within your data or there is consistency in the data that can not be expressed using APIs or standard relational mechanisms. At one job that at worked at my team was writing the reporting application for a large network packet inspection system. The volume of data was quite large for the time. In order to trigger a useful subset of test cases we really needed test data generated by the sniffers. This way correlations between the information about one protocol would correlate with information about another protocol. It was difficult to capture this in the API.
Most databases have tools to import and export delimited text files of tables. But often you only want subsets of them; making using data dumps more complicated. At my current job we need to take some dumps of actual data which gets generated by Matlab programs and stored in the database. We have tool which can dump a subset of the database data and then compare it with the "ground truth" for testing. It seems our extraction tools are being constantly modified.
I've used DBUnit to take snapshots of records in a database and store them in XML format. Then my unit tests (we called them integration tests when they required a database), can wipe and restore from the XML file at the start of each test.
I'm undecided whether this is worth the effort. One problem is dependencies on other tables. We left static reference tables alone, and built some tools to detect and extract all child tables along with the requested records. I read someone's recommendation to disable all foreign keys in your integration test database. That would make it way easier to prepare the data, but you're no longer checking for any referential integrity problems in your tests.
Another problem is database schema changes. We wrote some tools that would add default values for columns that had been added since the snapshots were taken.
Obviously these tests were way slower than pure unit tests.
When you're trying to test some legacy code where it's very difficult to write unit tests for individual classes, this approach may be worth the effort.
I do both, depending on what I need to test:
I import static test data from SQL scripts or DB dumps. This data is used in object load (deserialization or object mapping) and in SQL query tests (when I want to know whether the code will return the correct result).
Plus, I usually have some backbone data (config, value to name lookup tables, etc). These are also loaded in this step. Note that this loading is a single test (along with creating the DB from scratch).
When I have code which modifies the DB (object -> DB), I usually run it against a living DB (in memory or a test instance somewhere). This is to ensure that the code works; not to create any large amount of rows. After the test, I rollback the transaction (following the rule that tests must not modify the global state).
Of course, there are exceptions to the rule:
I also create large amount of rows in performance tests.
Sometimes, I have to commit the result of a unit test (otherwise, the test would grow too big).
I generally use SQL scripts to fill the data in the scenario you discuss.
It's straight-forward and very easily repeatable.
This will probably not answer all your questions, if any, but I made the decision in one project to do unit testing against the DB. I felt in my case that the DB structure needed testing too, i.e. did my DB design deliver what is needed for the application. Later in the project when I feel the DB structure is stable, I will probably move away from this.
To generate data I decided to create an external application that filled the DB with "random" data, I created a person-name and company-name generators etc.
The reason for doing this in an external program was:
1. I could rerun the tests on by test modified data, i.e. making sure my tests were able to run several times and the data modification made by the tests were valid modifications.
2. I could if needed, clean the DB and get a fresh start.
I agree that there are points of failure in this approach, but in my case since e.g. person generation was part of the business logic generating data for tests was actually testing that part too.
Our team confront the same question recently.
Before, we were using specflow to do integration testing. With specflow, QA can write each test case inside which populating necessary test data to DB.
Now, QA want to use postman to test API, how can they populate the data? One solution is creating Apis for populating them. Another is sync historical data from Prod to test env.
Will update my answer once we try different solutions and decide which one to go.